代码之家  ›  专栏  ›  技术社区  ›  Mr_and_Mrs_D

在python中将距离平方转换为长格式

  •  0
  • Mr_and_Mrs_D  · 技术社区  · 6 年前

    代码:

    import numpy as np
    import pandas as pd
    from scipy.spatial.distance import pdist, squareform
    
    ids = ['1', '2', '3']
    points=[(0,0), (1,1), (3,3)]
    distances = pdist(np.array(points), metric='euclidean')
    print(distances)
    distance_matrix = squareform(distances)
    print(distance_matrix)
    

    印刷品:

    [1.41421356 4.24264069 2.82842712]
    [[0.         1.41421356 4.24264069]
     [1.41421356 0.         2.82842712]
     [4.24264069 2.82842712 0.        ]]
    

    如预期

    我想把它变成一个长格式,用csv来写,就像

    id1,id2,distance
    1,1,0
    1,2,1.41421356
    1,3,4.24264069
    2,1,1.41421356
    2,2,0
    2,3,2.82842712
    

    等等-我该怎么做才能最大限度地提高效率?使用熊猫是一种选择

    2 回复  |  直到 6 年前
        1
  •  1
  •   jezrael    6 年前

    使用 DataFrame 承包商 stack :

    df = pd.DataFrame(distance_matrix, index=ids, columns=ids).stack().reset_index()
    df.columns=['id1','id2','distance']
    print (df)
      id1 id2  distance
    0   1   1  0.000000
    1   1   2  1.414214
    2   1   3  4.242641
    3   2   1  1.414214
    4   2   2  0.000000
    5   2   3  2.828427
    6   3   1  4.242641
    7   3   2  2.828427
    8   3   3  0.000000
    

    数据帧 承包商 numpy.repeat , numpy.tile ravel :

    df = pd.DataFrame({'id1':np.repeat(ids, len(ids)), 
                       'id2':np.tile(ids, len(ids)),
                       'dist':distance_matrix.ravel()})
    print (df)
      id1 id2      dist
    0   1   1  0.000000
    1   1   2  1.414214
    2   1   3  4.242641
    3   2   1  1.414214
    4   2   2  0.000000
    5   2   3  2.828427
    6   3   1  4.242641
    7   3   2  2.828427
    8   3   3  0.000000
    
        2
  •  0
  •   Divakar    6 年前

    我建议使用 indices_merged_arr_generic_using_cp -

    助手函数-

    import numpy as np
    import functools
    
    # https://stackoverflow.com/a/46135435/ by @unutbu
    def indices_merged_arr_generic_using_cp(arr):
        """
        Based on cartesian_product
        http://stackoverflow.com/a/11146645/190597 (senderle)
        """
        shape = arr.shape
        arrays = [np.arange(s, dtype='int') for s in shape]
        broadcastable = np.ix_(*arrays)
        broadcasted = np.broadcast_arrays(*broadcastable)
        rows, cols = functools.reduce(np.multiply, broadcasted[0].shape), len(broadcasted)+1
        out = np.empty(rows * cols, dtype=arr.dtype)
        start, end = 0, rows
        for a in broadcasted:
            out[start:end] = a.reshape(-1)
            start, end = end, end + rows
        out[start:] = arr.flatten()
        return out.reshape(cols, rows).T
    

    使用-

    In [169]: out = indices_merged_arr_generic_using_cp(distance_matrix)
    
    In [170]: np.savetxt('out.txt', out, fmt="%i,%i,%f")
    
    In [171]: !cat out.txt
    0,0,0.000000
    0,1,1.414214
    0,2,4.242641
    1,0,1.414214
    1,1,0.000000
    1,2,2.828427
    2,0,4.242641
    2,1,2.828427
    2,2,0.000000
    

    得到 distance_matrix 我们也可以使用 Scipy's cdist : cdist(points, points) . 还有 eucl_dist 包(免责声明:我是它的作者),它包含各种计算欧几里得距离的方法,这些方法比 SciPy's cdist 尤其是对于大型阵列。