代码之家  ›  专栏  ›  技术社区  ›  StefanK

在大数据集上找到离每个点最近的直线,可能使用shapely和rtree

  •  6
  • StefanK  · 技术社区  · 7 年前

    我有一个城市的简化地图,其中街道作为线串,地址作为点。我需要找到从每个点到任何街道线的最近路径。我有一个可以这样做的工作脚本,但它在多项式时间内运行,因为它嵌套了for循环。对于150000行(shapely LineString)和10000点(shapely Point),在8GB Ram计算机上完成需要10个小时。

    import pandas as pd
    import shapely
    from shapely import Point, LineString
    
    def connect_nodes_to_closest_edges(edges_df , nodes_df,
                                       edges_geom,
                                       nodes_geom):
        """Finds closest line to points and returns 2 dataframes:
            edges_df
            nodes_df
        """
        for i in range(len(nodes_df)):
            point = nodes_df.loc[i,nodes_geom]
            shortest_distance = 100000
            for j in range(len(edges_df)):
                line = edges_df.loc[j,edges_geom]
                if line.distance(point) < shortest_distance:
                    shortest_distance = line.distance(point)
                    closest_street_index = j
                    closest_line = line
                    ...
    

    然后我将结果保存在一个表中作为一个新列,该列将点到线的最短路径添加为一个新列。

    有没有办法通过增加一些功能来加快速度?

    rtree公司

    Faster way of polygon intersection with shapely

    https://pypi.python.org/pypi/Rtree/

    很抱歉,如果这已经得到了回答,但我在这里或地理信息系统上都没有找到答案。堆栈交换

    谢谢你的建议!

    2 回复  |  直到 7 年前
        1
  •  6
  •   eguaio    7 年前

    这里有一个解决方案,使用 rtree 稍后,使用位于点中心的框查询rtree。你会得到几个 您需要检查最小点击数,但点击数将为

    solutions dict您将获得每个点的线id,最近的线段,

    代码中有一些注释可以帮助您。 MIN_SIZE INFTY

    最小尺寸

    A太大了 最小尺寸

    还有一条评论。我假设没有太大的细分市场。由于我们使用段作为rtree中方框的对角线,如果在一条直线中有一些较大的段,这意味着将为该段分配一个巨大的方框,并且所有地址框都将与其相交。为了避免这种情况,始终可以通过添加更多中间点来人为提高线条的分辨率。

    import math
    from rtree import index
    from shapely.geometry import Polygon, LineString
    
    INFTY = 1000000
    MIN_SIZE = .8
    # MIN_SIZE should be a vaule such that if you build a box centered in each 
    # point with edges of size 2*MIN_SIZE, you know a priori that at least one 
    # segment is intersected with the box. Otherwise, you could get an inexact 
    # solution, there is an exception checking this, though.
    
    
    def distance(a, b):
        return math.sqrt( (a[0]-b[0])**2 + (a[1]-b[1])**2 ) 
    
    def get_distance(apoint, segment):
        a = apoint
        b, c = segment
        # t = <a-b, c-b>/|c-b|**2
        # because p(a) = t*(c-b)+b is the ortogonal projection of vector a 
        # over the rectline that includes the points b and c. 
        t = (a[0]-b[0])*(c[0]-b[0]) + (a[1]-b[1])*(c[1]-b[1])
        t = t / ( (c[0]-b[0])**2 + (c[1]-b[1])**2 )
        # Only if t 0 <= t <= 1 the projection is in the interior of 
        # segment b-c, and it is the point that minimize the distance 
        # (by pitagoras theorem).
        if 0 < t < 1:
            pcoords = (t*(c[0]-b[0])+b[0], t*(c[1]-b[1])+b[1])
            dmin = distance(a, pcoords)
            return pcoords, dmin
        elif t <= 0:
            return b, distance(a, b)
        elif 1 <= t:
            return c, distance(a, c)
    
    def get_rtree(lines):
        def generate_items():
            sindx = 0
            for lid, l in lines:
                for i in xrange(len(l)-1):
                    a, b = l[i]
                    c, d = l[i+1]
                    segment = ((a,b), (c,d))
                    box = (min(a, c), min(b,d), max(a, c), max(b,d)) 
                    #box = left, bottom, right, top
                    yield (sindx, box, (lid, segment))
                    sindx += 1
        return index.Index(generate_items())
    
    def get_solution(idx, points):
        result = {}
        for p in points:
            pbox = (p[0]-MIN_SIZE, p[1]-MIN_SIZE, p[0]+MIN_SIZE, p[1]+MIN_SIZE)
            hits = idx.intersection(pbox, objects='raw')    
            d = INFTY
            s = None
            for h in hits: 
                nearest_p, new_d = get_distance(p, h[1])
                if d >= new_d:
                    d = new_d
                    s = (h[0], h[1], nearest_p, new_d)
            result[p] = s
            print s
    
            #some checking you could remove after you adjust the constants
            if s == None:
                raise Exception("It seems INFTY is not big enough.")
    
            pboxpol = ( (pbox[0], pbox[1]), (pbox[2], pbox[1]), 
                        (pbox[2], pbox[3]), (pbox[0], pbox[3]) )
            if not Polygon(pboxpol).intersects(LineString(s[1])):  
                msg = "It seems MIN_SIZE is not big enough. "
                msg += "You could get inexact solutions if remove this exception."
                raise Exception(msg)
    
        return result
    

    我用这个例子测试了函数。

    xcoords = [i*10.0/float(1000) for i in xrange(1000)]
    l1 = [(x, math.sin(x)) for x in xcoords]
    l2 = [(x, math.cos(x)) for x in xcoords]
    points = [(i*10.0/float(50), 0.8) for i in xrange(50)]
    
    lines = [('l1', l1), ('l2', l2)]
    
    idx = get_rtree(lines)
    
    solutions = get_solution(idx, points)
    

    Result of test

        2
  •  1
  •   lenhhoxung    4 年前

    我一直在寻找解决方案,我发现 this 基本上,这是一种简单的方法,考虑了点和线的边界框的重叠。 然而,由于空间索引,计算成本显著降低。