代码之家  ›  专栏  ›  技术社区  ›  r_31415

在EC2实例中使用DASK引发“无法收集1个密钥…”

  •  1
  • r_31415  · 技术社区  · 6 年前

    我启动了两个EC2实例,安装了带有conda的dask,在各自的实例中启动了调度程序和工作程序,并且调度程序能够从工作程序接收连接。但是,在启动客户机并收集结果(例如 x.result() )抛出错误

    警告-无法收集1个密钥,重新计划,计划程序和工作程序之间的连接已终止。

    在这个问题上这几乎是相同的错误 2095 固定在 1278 。不幸的是,如何用新的标志来解决这个问题是很清楚的。

    我的课程如下:

    调度程序-终端

    >>> from dask.distributed import Client
    >>> client = Client('<domain-scheduler>:8786')
    >>> def inc(x):
    ...   return x + 1
    ...
    >>> x = client.submit(inc, 10)
    >>> x.result()
    distributed.client - WARNING - Couldn't gather 1 keys, rescheduling {'inc-17ff1aa09aeed9c364fc31df7522511e': ('tcp://172.30.3.63:38971',)}
    ^CTraceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/ubuntu/anaconda2/envs/dask-env/lib/python2.7/site-packages/distributed/client.py", line 190, in result
        raiseit=False)
      File "/home/ubuntu/anaconda2/envs/dask-env/lib/python2.7/site-packages/distributed/client.py", line 652, in sync
        return sync(self.loop, func, *args, **kwargs)
      File "/home/ubuntu/anaconda2/envs/dask-env/lib/python2.7/site-packages/distributed/utils.py", line 273, in sync
        e.wait(10)
      File "/home/ubuntu/anaconda2/envs/dask-env/lib/python2.7/threading.py", line 614, in wait
        self.__cond.wait(timeout)
      File "/home/ubuntu/anaconda2/envs/dask-env/lib/python2.7/threading.py", line 359, in wait
        _sleep(delay)
    KeyboardInterrupt
    

    调度程序-DASK调度程序

    (dask-env) ubuntu@ip-172-30-3-136:~$ dask-scheduler --host <domain-scheduler>:8786 --bokeh-port 8080
    distributed.scheduler - INFO - -----------------------------------------------
    distributed.scheduler - INFO - Clear task state
    distributed.scheduler - INFO -   Scheduler at:   tcp://172.30.3.136:8786
    distributed.scheduler - INFO -       bokeh at:         172.30.3.136:8080
    distributed.scheduler - INFO - Local Directory:      /tmp/scheduler-TX9nqO
    distributed.scheduler - INFO - -----------------------------------------------
    distributed.scheduler - INFO - Register tcp://172.30.3.63:38971
    distributed.scheduler - INFO - Starting worker compute stream, tcp://172.30.3.63:38971
    distributed.core - INFO - Starting established connection
    distributed.scheduler - INFO - Receive client connection: Client-b5d903b5-8620-11e8-8a4c-06a866fbd474
    distributed.core - INFO - Starting established connection
    distributed.scheduler - INFO - Remove worker tcp://172.30.3.63:38971
    distributed.core - INFO - Removing comms to tcp://172.30.3.63:38971
    distributed.scheduler - INFO - Lost all workers
    distributed.scheduler - ERROR - Workers don't have promised key: ['tcp://172.30.3.63:38971'], inc-17ff1aa09aeed9c364fc31df7522511e
    None
    ^Cdistributed.scheduler - INFO - End scheduler at u'tcp://<domain>:8786'
    

    工人-Dask工人

    (dask-env) ubuntu@ip-172-30-3-63:~$ dask-worker --host <domain-worker>:8786 <domain-scheduler>:8786
    distributed.nanny - INFO -         Start Nanny at: 'tcp://172.30.3.63:8786'
    distributed.worker - INFO -       Start worker at:    tcp://172.30.3.63:38971
    distributed.worker - INFO -          Listening to:    tcp://172.30.3.63:38971
    distributed.worker - INFO -              bokeh at:           172.30.3.63:8789
    distributed.worker - INFO -              nanny at:           172.30.3.63:8786
    distributed.worker - INFO - Waiting to connect to: tcp://<domain-schedule>:8786
    distributed.worker - INFO - -------------------------------------------------
    distributed.worker - INFO -               Threads:                          1
    distributed.worker - INFO -                Memory:                    1.04 GB
    distributed.worker - INFO -       Local Directory: /home/ubuntu/dask-worker-space/worker-EnKL22
    distributed.worker - INFO - -------------------------------------------------
    distributed.worker - INFO -         Registered to: tcp://<domain-scheduler>:8786
    distributed.worker - INFO - -------------------------------------------------
    distributed.core - INFO - Starting established connection
    distributed.worker - INFO - Stopping worker at tcp://172.30.3.63:38971
    distributed.worker - WARNING - Heartbeat to scheduler failed
    distributed.nanny - INFO - Closing Nanny at 'tcp://172.30.3.63:8786'
    distributed.dask_worker - INFO - End worker
    

    如您所见,会话在运行后终止 十、结果() 。我也试图包括 --listen-address , --contact-address 没有成功。

    2 回复  |  直到 6 年前
        1
  •  1
  •   Matt Nicolls    6 年前

    curl <domain-worker>:8789

        2
  •  0
  •   r_31415    6 年前

    dask-scheduler dask-worker

    dask-scheduler --host <domain-scheduler> --port 8786 --bokeh-port <open-port>
    

    dask-worker --host <domain-worker> <domain-scheduler>:8786 --worker-port 8786
    

    client = Client('tcp://<domain-scheduler>:8786')