代码之家 › 专栏 › 技术社区 › sudouser2010

使用Dask访问大型已发布阵列中的单个元素

dask-distributed dask-delayed dask

2

sudouser2010 · 技术社区 · 7 年前

在下面的示例中,客户端。get\u dataset('array1')[0]花费的时间与客户端大致相同。get\u dataset('array1')。

import distributed
client = distributed.Client()
data = [1]*10000000
payload = {'array1': data}
client.publish(**payload)

one_element = client.get_dataset('array1')[0]

1 回复 | 直到 7 年前

1

4

MRocklin 7 年前

客户1

import dask.array as da
x = da.ones(10000000, chunks=(100000,))  # 1e7 size array cut into 1e5 size chunks
x = x.persist()  # persist array on the workers of the cluster

client.publish(x=x)  # store the metadata of x on the scheduler

客户2

x = client.get_dataset('x')  # get the lazy collection x
x[0].compute()  # this selection happens on the worker, only the result comes down

推荐文章

vva · 如何使用Dask在yarn上运行并行python作业?

7 年前

evilkonrex · 从并行txt文件中读取dask数据帧

7 年前

evilkonrex · dask数据帧的延迟重划分

7 年前

julienl · 从分布式dask中的线程池中分离任务

7 年前

Daniel Severo · 具有未来的Dask计算子图

7 年前

skibee · 调试dask-检测客户端失败

7 年前

sudouser2010 · 使用Dask访问大型已发布阵列中的单个元素

7 年前