代码之家 › 专栏 › 技术社区 › politinsa

如何使np.where更有效地处理三角矩阵?

itertools scipy numpy python

politinsa · 技术社区 · 7 年前

我得到了这个代码,距离是一个下三角矩阵,定义如下:

distance = np.tril(scipy.spatial.distance.cdist(points, points))  
def make_them_touch(distance):
    """
    Return the every distance where two points touched each other. See example below.
    """
    thresholds = np.unique(distance)[1:] # to avoid 0 at the beginning, not taking a lot of time at all
    result = dict()
    for t in thresholds:
            x, y = np.where(distance == t)
            result[t] = [i for i in zip(x,y)]
    return result

我的问题是大矩阵的np.where很慢(例如2000*100)。
如何通过改进np.where或更改算法来加快此代码的速度?

编辑: 作为 MaxU 指出,这里最好的优化不是生成平方矩阵和使用迭代器。

例子:

points = np.array([                                                                        
...: [0,0,0,0],                                                            
...: [1,1,1,1],         
...: [3,3,3,3],              
...: [6,6,6,6]                             
...: ])  

In [106]: distance = np.tril(scipy.spatial.distance.cdist(points, points))

In [107]: distance
Out[107]: 
array([[ 0.,  0.,  0.,  0.],
   [ 2.,  0.,  0.,  0.],
   [ 6.,  4.,  0.,  0.],
   [12., 10.,  6.,  0.]])

In [108]: make_them_touch(distance)
Out[108]: 
{2.0: [(1, 0)],
 4.0: [(2, 1)],
 6.0: [(2, 0), (3, 2)],
 10.0: [(3, 1)],
 12.0: [(3, 0)]}

1 回复 | 直到 7 年前

MaxU - stand with Ukraine 7 年前

更新1: 以下是 上面的 三角形距离矩阵(因为距离矩阵总是对称的,所以这并不重要):

from itertools import combinations

res = {tup[0]:tup[1] for tup in zip(pdist(points), list(combinations(range(len(points)), 2)))}

结果:

In [111]: res
Out[111]:
{1.4142135623730951: (0, 1),
 4.69041575982343: (0, 2),
 4.898979485566356: (1, 2)}

更新2: 此版本将支持远距离复制:

In [164]: import pandas as pd

首先我们建造一只熊猫。系列:

In [165]: s = pd.Series(list(combinations(range(len(points)), 2)), index=pdist(points))

In [166]: s
Out[166]:
2.0     (0, 1)
6.0     (0, 2)
12.0    (0, 3)
4.0     (1, 2)
10.0    (1, 3)
6.0     (2, 3)
dtype: object

现在我们可以按索引分组并生成坐标列表:

In [167]: s.groupby(s.index).apply(list)
Out[167]:
2.0             [(0, 1)]
4.0             [(1, 2)]
6.0     [(0, 2), (2, 3)]
10.0            [(1, 3)]
12.0            [(0, 3)]
dtype: object

ps这里的主要思想是,如果要在之后将其展平并消除重复项,就不应该构建平方距离矩阵。

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

4 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

4 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

4 月前

user29715306 · from_users=和chats=电视节目中的差异

4 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

4 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

4 月前

prayner · 更新嵌套字典包含列表中的项

4 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

4 月前

Dave · 如何在for循环中修改列表值

4 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

4 月前