代码之家  ›  专栏  ›  技术社区  ›  user2667066

numpy:大量线段/点的快速规则间隔平均值

  •  3
  • user2667066  · 技术社区  · 6 年前

    这似乎是图像分析中需要做的很多事情。那么这个操作有名字吗?在numpy中,最快的方法是什么来计算这样一组规则间隔的平均y值?这有点像插值,我想,只是我不想只取周围两个点的平均值,而是取一个固定间隔内所有点的加权平均值(加上一点重叠)。

    因此,假设一条水平线有5个线段,由[0,1.1,2.2,2.3,2.8,4]分隔(即直线从0到4)。假设每一段采用任意的着色值,例如,我们可以有5个着色值[0,0.88,0.55,0.11,0.44]——其中0是黑色,1是白色。如果我想用4个像素来绘制这个图,我需要创建4个值,从0…1,1…2,等等,并且希望计算返回每个值的以下值:

    0…1=0(这由第一个线段0->1.1涵盖)

    3…4=0.44(这包含在最后一个线段中,2.8->4)

    然而,如果我想将这些数据拟合成一条2像素长的线,那么这2个像素将具有以下值:

    0...2 = 1.1 / 2 * 0 + 0.9 / 2 * 0.88

    3 回复  |  直到 6 年前
        1
  •  4
  •   Mad Physicist    6 年前

    如你所料,这个问题有一个纯粹简单的解决方案。诀窍是巧妙地混合 np.searchsorted ,它会将您的常规网格放置在离原件最近的箱子上,并且 np.add.reduceat 要计算箱子的总和:

    import numpy as np
    
    def distribute(x, y, n):
        """
        Down-samples/interpolates the y-values of each segment across a
        domain with `n` points. `x` represents segment endpoints, so should
        have one more element than `y`. 
        """
        y = np.asanyarray(y)
        x = np.asanyarray(x)
    
        new_x = np.linspace(x[0], x[-1], n + 1)
        # Find the insertion indices
        locs = np.searchsorted(x, new_x)[1:]
        # create a matrix of indices
        indices = np.zeros(2 * n, dtype=np.int)
        # Fill it in
        dloc = locs[:-1] - 1
        indices[2::2] = dloc
        indices[1::2] = locs
    
        # This is the sum of every original segment a new segment touches
        weighted = np.append(y * np.diff(x), 0)
        sums = np.add.reduceat(weighted, indices)[::2]
    
        # Now subtract the adjusted portions from the right end of the sums
        sums[:-1] -= (x[dloc + 1] - new_x[1:-1]) * y[dloc]
        # Now do the same for the left of each interval
        sums[1:] -= (new_x[1:-1] - x[dloc]) * y[dloc]
    
        return new_x, sums / np.diff(new_x)
    
    
    seg = [0, 1.1, 2.2, 2.3, 2.8, 4]
    color = [0, 0.88, 0.55, 0.11, 0.44]
    
    seg, color = distribute(seg, color, 4)
    print(seg, color)
    

    结果是

    [0. 1. 2. 3. 4.] [0.    0.792 0.374 0.44 ]
    

    基准

    EE_'s solution 我和我的同意了答案,并检查了时间安排。我稍微修改了另一个解决方案,使其具有与我相同的接口:

    from scipy.interpolate import interp1d
    
    def EE_(x, y, n):
        I = np.zeros_like(x)
        I[1:] = np.cumsum(np.diff(x) * y)
        f = interp1d(x, I, bounds_error=False, fill_value=(0, I[-1]))
        pix_x = np.linspace(x[0], x[-1], n + 1)
        pix_y = (f(pix_x[1:]) - f(pix_x[:-1])) / (pix_x[1:] - pix_x[:-1])
        return pix_x, pix_y
    

    这里是测试台(方法 MadPhysicist 刚刚从 distribute x y

    np.random.seed(0x1234ABCD)
    
    x = np.cumsum(np.random.gamma(3.0, 0.2, size=1001))
    y = np.random.uniform(0.0, 1.0, size=1000)
    
    tests = (
        MadPhysicist,
        EE_,
    )
    
    for n in (5, 10, 100, 1000, 10000):
        print(f'N = {n}')
        results = {test.__name__: test(x, y, n) for test in tests}
    
        for name, (x_out, y_out) in results.items():
            print(f'{name}:\n\tx = {x_out}\n\ty = {y_out}')
    
        allsame = np.array([[np.allclose(x1, x2) and np.allclose(y1, y2)
                             for x2, y2 in results.values()]
                            for x1, y1 in results.values()])
        print()
        print(f'Result Match:\n{allsame}')
    
        from IPython import get_ipython
        magic = get_ipython().magic
    
        for test in tests:
            print(f'{test.__name__}({n}):\n\t', end='')
            magic(f'timeit {test.__name__}(x, y, n)')
    

    我将跳过数据和协议打印输出(结果相同),并显示计时:

    N = 5
    MadPhysicist: 50.6 µs ± 349 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    EE_:           110 µs ± 568 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    N = 10
    MadPhysicist: 50.5 µs ± 732 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    EE_:           111 µs ± 635 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)   
    
    N = 100
    MadPhysicist: 54.5 µs ± 284 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    EE_:           114 µs ± 215 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    N = 1000
    MadPhysicist: 107 µs ± 5.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    EE_:          148 µs ± 5.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    N = 10000
    MadPhysicist: 458 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    EE_:          301 µs ± 4.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

        2
  •  2
  •   EE_    6 年前

    你仍然可以通过线性插值来完成。虽然你的函数是分段常数,但你需要它在很多小区间上的平均值。某个函数的平均值 f ( )从 b 就是它在这个范围内的积分除以 b . 分段常数函数的积分将是分段线性函数。所以,假设你有你的数据:

    x = [0, 1.1, 2.2, 2.3, 2.8, 4]
    y = [0, 0.88, 0.55, 0.11, 0.44]
    

    做一个函数,使其积分为 . 这里是数组 I ,以及函数 f 是它的线性插值,它将给出任意点的精确值:

    I = numpy.zeros_like(x)
    I[1:] = numpy.cumsum(numpy.diff(x) * y)
    f = scipy.interpolate.interp1d(x, I)
    

    pix_x = numpy.linspace(0, 4, 5)
    pix_y = (f(pix_x[1:]) - f(pix_x[:-1])) / (pix_x[1:] - pix_x[:-1])
    

    我们可以检查这些数组中的内容:

    >>> pix_x
    array([0., 1., 2., 3., 4.])
    >>> pix_y
    array([0.   , 0.792, 0.374, 0.44 ])
    

    像素的着色值现在处于 pix_y

    这应该是相当快的,甚至对许多,许多点:

    def test(x, y):
        I = numpy.zeros_like(x)
        I[1:] = numpy.cumsum(numpy.diff(x) * y)
        f = scipy.interpolate.interp1d(x, I,
            bounds_error=False, fill_value=(0, I[-1]))
        pix_x = numpy.linspace(0, 1, 1001)
        pix_y = (f(pix_x[1:]) - f(pix_x[:-1])) / (pix_x[1:] - pix_x[:-1])
        return pix_y
    

    timeit 报告:

    225 ms ± 37.6 ms per loop
    

    bounds_error=False fill_value=(0, I[-1]) interp1d . 这样做的效果是假设着色函数在 内部1d 不需要对输入值进行排序;在上面的测试中,我同时给出了 y 作为0到1之间的均匀随机数的数组。但是,如果您确定它们已排序,则可以通过 assume_sorted=True 你应该提高速度:

    20.2 ms ± 377 µs per loop
    
        3
  •  0
  •   user3097732    6 年前

    scipy.signal.resample

    这将把你的信号转换成一个频谱,然后转换成一个沿x轴有规律间隔的新时间序列。

    另请参见以下问题: How to uniformly resample a non uniform signal using scipy .