代码之家  ›  专栏  ›  技术社区  ›  Brad Solomon

aiohtp concurrent get请求导致clientConnectorError(8,'nodename nor servname provided,or not known')

  •  6
  • Brad Solomon  · 技术社区  · 6 年前

    我被一个似乎与 asyncio + aiohttp 因此,当发送大量并发GET请求时,超过85%的请求会引发 aiohttp.client_exceptions.ClientConnectorError 最终源于

    socket.gaierror(8, 'nodename nor servname provided, or not known')
    

    在主机/端口上发送单个GET请求或执行基础DNS解析时,不会引发此异常。

    在我真正的代码中,我做了大量的定制,比如使用自定义 TCPConnector 例如,我可以只使用“默认”来重现问题。 AIOHTTP 类实例和参数,如下所示。

    我跟踪了回溯,异常的根源与DNS解析有关。它来自 _create_direct_connection 方法 aiohttp.TCPConnector ,哪些调用 ._resolve_host() .

    我也尝试过:

    • 使用(而非使用) aiodns
    • sudo killall -HUP mDNSResponder
    • 使用 family=socket.AF_INET 作为一个论点 TCP连接器 (尽管我相当肯定这是 AIODNS 不管怎样)。这种用途 2 而不是默认的int 0 到那个巴黎
    • ssl=True ssl=False

    一切都无济于事。


    下面是要复制的完整代码。输入URL位于 https://gist.github.com/bsolomon1124/fc625b624dd26ad9b5c39ccb9e230f5a .

    import asyncio
    import itertools
    
    import aiohttp
    import aiohttp.client_exceptions
    
    from yarl import URL
    
    ua = itertools.cycle(
        (
            "Mozilla/5.0 (X11; Linux i686; rv:64.0) Gecko/20100101 Firefox/64.0",
            "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:62.0) Gecko/20100101 Firefox/62.0",
            "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; ko; rv:1.9.1b2) Gecko/20081201 Firefox/60.0",
            "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
        )
    )
    
    async def get(url, session) -> str:
        async with await session.request(
            "GET",
            url=url,
            raise_for_status=True,
            headers={'User-Agent': next(ua)},
            ssl=False
        ) as resp:
            text = await resp.text(encoding="utf-8", errors="replace")
            print("Got text for URL", url)
            return text
    
    async def bulk_get(urls) -> list:
        async with aiohttp.ClientSession() as session:
            htmls = await asyncio.gather(
                *(
                    get(url=url, session=session)
                    for url in urls
                ),
                return_exceptions=True
            )
            return htmls
    
    
    # See https://gist.github.com/bsolomon1124/fc625b624dd26ad9b5c39ccb9e230f5a
    with open("/path/to/urls.txt") as f:
        urls = tuple(URL(i.strip()) for i in f)
    
    res = asyncio.run(bulk_get(urls))  # urls: Tuple[yarl.URL]
    
    c = 0
    for i in res:
        if isinstance(i, aiohttp.client_exceptions.ClientConnectorError):
            print(i)
            c += 1
    
    print(c)  # 21205 !!!!! (85% failure rate)
    print(len(urls))  # 24934
    

    从打印每个异常字符串 res 看起来像:

    Cannot connect to host sigmainvestments.com:80 ssl:False [nodename nor servname provided, or not known]
    Cannot connect to host giaoducthoidai.vn:443 ssl:False [nodename nor servname provided, or not known]
    Cannot connect to host chauxuannguyen.org:80 ssl:False [nodename nor servname provided, or not known]
    Cannot connect to host www.baohomnay.com:443 ssl:False [nodename nor servname provided, or not known]
    Cannot connect to host www.soundofhope.org:80 ssl:False [nodename nor servname provided, or not known]
    # And so on...
    

    令人沮丧的是我可以 ping 这些主机没有问题,甚至调用底层 .u解析主机() :

    巴什/贝壳:

     [~/] $ ping -c 5 www.hongkongfp.com
    PING www.hongkongfp.com (104.20.232.8): 56 data bytes
    64 bytes from 104.20.232.8: icmp_seq=0 ttl=56 time=11.667 ms
    64 bytes from 104.20.232.8: icmp_seq=1 ttl=56 time=12.169 ms
    64 bytes from 104.20.232.8: icmp_seq=2 ttl=56 time=12.135 ms
    64 bytes from 104.20.232.8: icmp_seq=3 ttl=56 time=12.235 ms
    64 bytes from 104.20.232.8: icmp_seq=4 ttl=56 time=14.252 ms
    
    --- www.hongkongfp.com ping statistics ---
    5 packets transmitted, 5 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 11.667/12.492/14.252/0.903 ms
    

    蟒蛇:

    In [1]: import asyncio 
       ...: from aiohttp.connector import TCPConnector 
       ...: from clipslabapp.ratemgr import default_aiohttp_tcpconnector 
       ...:  
       ...:  
       ...: async def main(): 
       ...:     conn = default_aiohttp_tcpconnector() 
       ...:     i = await asyncio.create_task(conn._resolve_host(host='www.hongkongfp.com', port=443)) 
       ...:     return i 
       ...:  
       ...: i = asyncio.run(main())                                                                                                                               
    
    In [2]: i                                                                                                                                                     
    Out[2]: 
    [{'hostname': 'www.hongkongfp.com',
      'host': '104.20.232.8',
      'port': 443,
      'family': <AddressFamily.AF_INET: 2>,
      'proto': 6,
      'flags': <AddressInfo.AI_NUMERICHOST: 4>},
     {'hostname': 'www.hongkongfp.com',
      'host': '104.20.233.8',
      'port': 443,
      'family': <AddressFamily.AF_INET: 2>,
      'proto': 6,
      'flags': <AddressInfo.AI_NUMERICHOST: 4>}]
    

    我的设置:

    • Python 3.7.1
    • AIOHTTP 3.5.4
    • 在Mac OSX High Sierra和Ubuntu 18.04上发生

    有关异常本身的信息:

    例外情况是 aiohtp.client ou exceptions.clientConnectorError(客户端连接错误) 包裹 socket.gaierror 作为基础 OSError .

    自从我有 return_exceptions=True 在里面 asyncio.gather() ,我可以自己获取异常实例进行检查。下面是一个例子:

    In [18]: i
    Out[18]:
    aiohttp.client_exceptions.ClientConnectorError(8,
                                                   'nodename nor servname provided, or not known')
    
    In [19]: i.host, i.port
    Out[19]: ('www.hongkongfp.com', 443)
    
    In [20]: i._conn_key
    Out[20]: ConnectionKey(host='www.hongkongfp.com', port=443, is_ssl=True, ssl=False, proxy=None, proxy_auth=None, proxy_headers_hash=None)
    
    In [21]: i._os_error
    Out[21]: socket.gaierror(8, 'nodename nor servname provided, or not known')
    
    In [22]: raise i.with_traceback(i.__traceback__)
    ---------------------------------------------------------------------------
    gaierror                                  Traceback (most recent call last)
    ~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_direct_connection(self, req, traces, timeout, client_error)
        954                 port,
    --> 955                 traces=traces), loop=self._loop)
        956         except OSError as exc:
    
    ~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _resolve_host(self, host, port, traces)
        824                 addrs = await \
    --> 825                     self._resolver.resolve(host, port, family=self._family)
        826                 if traces:
    
    ~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/resolver.py in resolve(self, host, port, family)
         29         infos = await self._loop.getaddrinfo(
    ---> 30             host, port, type=socket.SOCK_STREAM, family=family)
         31
    
    /usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py in getaddrinfo(self, host, port, family, type, proto, flags)
        772         return await self.run_in_executor(
    --> 773             None, getaddr_func, host, port, family, type, proto, flags)
        774
    
    /usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py in run(self)
         56         try:
    ---> 57             result = self.fn(*self.args, **self.kwargs)
         58         except BaseException as exc:
    
    /usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
        747     addrlist = []
    --> 748     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
        749         af, socktype, proto, canonname, sa = res
    
    gaierror: [Errno 8] nodename nor servname provided, or not known
    
    The above exception was the direct cause of the following exception:
    
    ClientConnectorError                      Traceback (most recent call last)
    <ipython-input-22-72402d8c3b31> in <module>
    ----> 1 raise i.with_traceback(i.__traceback__)
    
    <ipython-input-1-2bc0f5172de7> in get(url, session)
         19         raise_for_status=True,
         20         headers={'User-Agent': next(ua)},
    ---> 21         ssl=False
         22     ) as resp:
         23         return await resp.text(encoding="utf-8", errors="replace")
    
    ~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/client.py in _request(self, method, str_or_url, params, data, json, cookies, headers, skip_auto_headers, auth, allow_redirects, max_redirects, compress, chunked, expect100, raise_for_status, read_until_eof, proxy, proxy_auth, timeout, verify_ssl, fingerprint, ssl_context, ssl, proxy_headers, trace_request_ctx)
        474                                 req,
        475                                 traces=traces,
    --> 476                                 timeout=real_timeout
        477                             )
        478                     except asyncio.TimeoutError as exc:
    
    ~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in connect(self, req, traces, timeout)
        520
        521             try:
    --> 522                 proto = await self._create_connection(req, traces, timeout)
        523                 if self._closed:
        524                     proto.close()
    
    ~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_connection(self, req, traces, timeout)
        852         else:
        853             _, proto = await self._create_direct_connection(
    --> 854                 req, traces, timeout)
        855
        856         return proto
    
    ~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_direct_connection(self, req, traces, timeout, client_error)
        957             # in case of proxy it is not ClientProxyConnectionError
        958             # it is problem of resolving proxy ip itself
    --> 959             raise ClientConnectorError(req.connection_key, exc) from exc
        960
        961         last_exc = None  # type: Optional[Exception]
    
    ClientConnectorError: Cannot connect to host www.hongkongfp.com:443 ssl:False [nodename nor servname provided, or not known
    

    为什么我不认为这是操作系统级别的DNS解析问题?

    我可以成功ping我的ISP的DNS服务器的IP地址,这些地址在(mac osx)系统首选项>网络>DNS中给出:

     [~/] $ ping -c 2 75.75.75.75
    PING 75.75.75.75 (75.75.75.75): 56 data bytes
    64 bytes from 75.75.75.75: icmp_seq=0 ttl=57 time=16.478 ms
    64 bytes from 75.75.75.75: icmp_seq=1 ttl=57 time=21.042 ms
    
    --- 75.75.75.75 ping statistics ---
    2 packets transmitted, 2 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 16.478/18.760/21.042/2.282 ms
     [~/] $ ping -c 2 75.75.76.76
    PING 75.75.76.76 (75.75.76.76): 56 data bytes
    64 bytes from 75.75.76.76: icmp_seq=0 ttl=54 time=33.904 ms
    64 bytes from 75.75.76.76: icmp_seq=1 ttl=54 time=32.788 ms
    
    --- 75.75.76.76 ping statistics ---
    2 packets transmitted, 2 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 32.788/33.346/33.904/0.558 ms
    
     [~/] $ ping6 -c 2 2001:558:feed::1
    PING6(56=40+8+8 bytes) 2601:14d:8b00:7d0:6587:7cfc:e2cc:82a0 --> 2001:558:feed::1
    16 bytes from 2001:558:feed::1, icmp_seq=0 hlim=57 time=14.927 ms
    16 bytes from 2001:558:feed::1, icmp_seq=1 hlim=57 time=14.585 ms
    
    --- 2001:558:feed::1 ping6 statistics ---
    2 packets transmitted, 2 packets received, 0.0% packet loss
    round-trip min/avg/max/std-dev = 14.585/14.756/14.927/0.171 ms
     [~/] $ ping6 -c 2 2001:558:feed::2
    PING6(56=40+8+8 bytes) 2601:14d:8b00:7d0:6587:7cfc:e2cc:82a0 --> 2001:558:feed::2
    16 bytes from 2001:558:feed::2, icmp_seq=0 hlim=54 time=12.694 ms
    16 bytes from 2001:558:feed::2, icmp_seq=1 hlim=54 time=11.555 ms
    
    --- 2001:558:feed::2 ping6 statistics ---
    2 packets transmitted, 2 packets received, 0.0% packet loss
    round-trip min/avg/max/std-dev = 11.555/12.125/12.694/0.569 ms
    
    1 回复  |  直到 6 年前
        1
  •  0
  •   Brad Solomon    6 年前

    经过进一步调查,这个问题似乎不是由 aiohttp / asyncio 但这两个方面都有局限性/局限性:

    • DNS服务器的容量/速率限制
    • 系统级打开文件的最大数目。

    首先,对于那些希望得到一些增强的DNS服务器的人(我可能不会走这条路),大名鼎鼎的选择似乎是:

    • 1.1.1.1(CloudFlare)
    • 8.8.8.8(Google公共DNS)
    • 亚马逊航线53

    ( Good intro to DNS 对于像我这样缺乏网络概念的人。)

    我做的第一件事是在一个经过改进的AWS EC2实例上运行上面的代码——h1.16xlarge运行的Ubuntu是IO优化的。我不能说这本身是有帮助的,但肯定不会有伤害。我不太熟悉EC2实例使用的默认DNS服务器,但是在复制上述脚本时,上面errno==8的OSerrror就消失了。

    然而,这在它的位置上提出了一个新的例外,代码为24的oserrror,“打开的文件太多。”我的hotfix解决方案(不是说这是最可持续或最安全的)是增加最大文件限制。我这样做是通过:

    sudo vim /etc/security/limits.conf
    # Add these lines
    root    soft    nofile  100000
    root    hard    nofile  100000
    ubuntu    soft    nofile  100000
    ubuntu    hard    nofile  100000
    
    sudo vim /etc/sysctl.conf
    # Add this line
    fs.file-max = 2097152
    
    sudo sysctl -p
    
    sudo vim /etc/pam.d/commmon_session
    # Add this line
    session required pam_limits.so
    
    sudo reboot
    

    诚然,我在黑暗中摸索着,但这和 asyncio.Semaphore(1024) (例) here )导致上述两个异常中的0个被引发:

    # Then call this from bulk_get with asyncio.Sempahore(n)
    async def bounded_get(sem, url, session) -> str:
        async with sem:
            return await get(url, session)
    

    在~25k个输入URL中,只有~100个get请求返回异常,主要是因为这些网站被合法破坏,总完成时间在几分钟内,我认为这是可以接受的。