代码之家  ›  专栏  ›  技术社区  ›  okkhoy

将值数字化为“floor”bin python

  •  0
  • okkhoy  · 技术社区  · 6 年前

    我需要数字化一些值,以便返回的索引是“floor”或“ceiling”bin。

    例如,用于 bins = numpy.array([0.0, 0.5, 1.0, 1.5, 2.0]) 和一个价值 0.2 我希望指数是 0 ,作为一个值 0.26 返回的索引应该是 1 , 等等。

    我有以下丑陋的功能来做我想做的事:

    import numpy
    
    def get_bin_index(value, bins):
        bin_diff = bins[1]-bins[0]
        index = numpy.digitize(value, bins)
        if bins[index] - value > bin_diff/2.0:
            index -= 1
        return index
    

    有没有什么简洁(读得更好/更有效)的方法可以做到这一点?


    编辑:包括计时值(只是满足我的好奇心!)

    In [1]: def get_bin_index(value, bins):
        ...:     bin_diff = bins[1]-bins[0]
        ...:     index = numpy.digitize(value, bins)
        ...:     if bins[index] - value > bin_diff/2.0:
        ...:         index -= 1
        ...:     return index
        ...:
    
    In [2]: def get_bin_index_c(value, bins):
        ...:     return numpy.rint((value-bins[0])/(bins[1]-bins[0]))
        ...:
    
    In [3]: def get_bin_index_mid_digitized(value, bins):
        ...:     return numpy.digitize(0.6, (bins[1:] + bins[:-1])/2.0)
        ...:
    
    In [4]: bin_halfs = numpy.array([0.0, 0.5, 1.0, 1.5, 2.0])
    
    In [5]: %timeit get_bin_index(0.9, bin_halfs)
    The slowest run took 5.71 times longer than the fastest. This could mean that an intermediate result is being cached.
    1000000 loops, best of 3: 4.93 µs per loop
    
    In [6]: %timeit get_bin_index_c(0.9, bin_halfs)
    The slowest run took 14.60 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 2.34 µs per loop
    
    In [7]: %timeit get_bin_index_mid_digitized(0.9, bin_halfs)
    The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached.
    100000 loops, best of 3: 8.37 µs per loop
    
    2 回复  |  直到 6 年前
        1
  •  1
  •   kuppern87    6 年前

    如果bin_diff都相同,则可以通过以下方法在恒定时间内执行此操作:

    def get_bin_index2(value, bins):
        return numpy.rint((value - bins[0])/(bins[1]-bins[0]))
    
        2
  •  1
  •   Divakar    6 年前

    你只需把垃圾箱的中间部分拿来 np.digitize -

    np.digitize(value, (bins[1:] + bins[:-1])/2.0)