代码之家 › 专栏 › 技术社区 › Azeem

分布图搜索

distributed-system breadth-first-search distributed-computing graph-theory algorithm

Azeem · 技术社区 · 6 年前

给出了一个巨大的图,它被分成几个节点。每个节点包含顶点集的一些划分和全局邻接信息。

在这个分布图上,如果给定源顶点和它所在的节点的地址,实现BFS的最佳方法是什么。解决方案应该可靠、快速。

1 回复 | 直到 6 年前

btilly 6 年前

有一个许多想办法让它发挥作用。下面是一个简单的方法,它利用 https://en.wikipedia.org/wiki/MapReduce .

我假设有三个机器池可供您使用。

图形数据库
分布式密钥/值存储(如Cassandra)
映射/减少工作线程(如Hadoop)

Insert into your kv store the fact that you reached the starting vertex at `(0, Null)` where the pair is `(generation, from)`:
while not finished and last generation found new stuff:
     Do a mapreduce from your k/v store:
       map gets (vertex, (found_generation, from_vertex):
         - sends:
            if found_generation is current_generation:
                foreach new_vertex adjacent to vertex (lookup in graph db):
                    send: new_vertex: (current_generation + 1, vertex)
            else:
                send: vertex: (found_generation, from_vertex)
       reduce gets (vertex, (found1, v1), (found2, v2), ...)
           search list for sample record with minimum found
               store minimum found back in k/v store
if found target:
   recursively walk k/v store to reconstruct the path
clear k/v store of working set
return answer

关键是图和k/v存储的所有查找都是分布式的,map/reduce中的所有工作也都是分布式的。每代的同步是最小的。因此,这将以分布式的方式完成大部分工作。

还有一张演出便条。一般来说,从单台机器上的简单实现到分布式机器是成本和资源的一个数量级增长,随之而来的是巨大的可伸缩性。从一个简单的实现到一个优化得很好的实现往往会在性能上提高1-2个数量级,但是你只能做一台机器能做的事情。仔细选择要采取的方法。只有在你真的需要的时候才分发。

推荐文章

Bob · BFS用映射C替换数组++

6 年前

quantummidget · 正在查找BFS父关系数组

6 年前

Ganpat · BFS中队列大小的重要性

6 年前

R A · python中的BFS实现速度不够快

6 年前

Fueled By Coffee · 检测循环依存关系

7 年前

DK100 · 在广度优先搜索中处理重复节点

7 年前

Garrick · 邻接表表示的时间复杂度?

7 年前

Upender Reddy · 从左上角到右下角单元格的最小路径,在这里我们可以在北、南、东、西方向进行遍历

8 年前

César Vallejo · 将宽度优先搜索更改为深度优先搜索,N-Queen解算器

8 年前

Ankit Mishra · 图与BFS和DFS树的等价性

8 年前