代码之家 › 专栏 › 技术社区 › Nico Schlömer David Maze

高效创建具有重复结构的NumPy阵列

numpy arrays python

Nico Schlömer David Maze · 技术社区 · 7 年前

shuffle() ),获取两个数字并返回一个数组(此处长度为8,但可能更大)。然后连接这些数组。

import numpy


def shuffle(a, b):
    return numpy.array([
        [+a, +b], [-a, +b], [+a, -b], [-a, -b],
        [+b, +a], [-b, +a], [+b, -a], [-b, -a],
        ])


pairs = [
    (0.1, 0.2),
    (3.14, 2.71), 
    # ... many, without a particular pattern ...
    (0.707, 0.577)
    ]
out = numpy.concatenate([shuffle(*pair) for pair in pairs])

out 。当有很多对时,这会变得不必要的低效 (a, b) shuffle 被返回更多数据的内容替换。

解决这个问题的一种方法是硬编码 出来

out = numpy.array([
    [+0.1, +0.2],
    [-0.1, +0.2],
    # ...
    [-0.2, -0.1],
    [+3.14, +2.71],
    # ...
    ])

但这显然也不可取。

在C语言中,我可能会使用由预处理器解析的宏。

4 回复 | 直到 7 年前

hpaulj 7 年前

这:

   [
    [+a, +b], [-a, +b], [+a, -b], [-a, -b],
    [+b, +a], [-b, +a], [+b, -a], [-b, -a],
    ]

np.array(...) 然后将列表转换为数组。

np.fromiterable

这一步真的有那么大的时间消耗吗?

一些时间探索:

In [245]: timeit shuffle(1,2)
9.29 Âµs Â± 12.3 ns per loop (mean Â± std. dev. of 7 runs, 100000 loops each)
...
In [248]: out=np.concatenate([shuffle(1,2) for _ in range(100)])
In [249]: out.shape
Out[249]: (800, 2)
In [250]: timeit out=np.concatenate([shuffle(1,2) for _ in range(100)])
1.02 ms Â± 4.8 Âµs per loop (mean Â± std. dev. of 7 runs, 1000 loops each)

生成相同大小的数组,但使用更简单的连接。如果生成正确的数字,这可能是可选速度:

In [251]: np.stack([np.arange(800),np.arange(800)],1).shape
Out[251]: (800, 2)
In [252]: timeit np.stack([np.arange(800),np.arange(800)],1).shape
21.4 Âµs Â± 902 ns per loop (mean Â± std. dev. of 7 runs, 10000 loops each)

我们可以探索替代方案,但在某种程度上,您希望优先考虑清晰性。生成所需阵列的最清晰方法是什么?

array 呼叫

def shuffle1(a, b):
    return [
        [+a, +b], [-a, +b], [+a, -b], [-a, -b],
        [+b, +a], [-b, +a], [+b, -a], [-b, -a],
        ]

In [259]: timeit np.array([shuffle1(1,2) for _ in range(100)]).reshape(-1,2)
765 Âµs Â± 14.7 Âµs per loop (mean Â± std. dev. of 7 runs, 1000 loops each)

1ms v.75ms-适度的速度提升。

使用 fromiter np.array 在shuffle中,时间减半:

def shuffle2(a, b):
    return np.fromiter(
        [+a, +b, -a, +b, +a, -b, -a, -b,
        +b, +a, -b, +a, +b, -a, -b, -a,
        ],int).reshape(-1,2)

In [279]: timeit out=np.concatenate([shuffle2(1,2) for _ in range(100)])
503 Âµs Â± 4.56 Âµs per loop (mean Â± std. dev. of 7 runs, 1000 loops each)

Warren Weckesser 7 年前

这里有一个使用奇特索引的方法。

pairs 是您的样本输入,存储在numpy数组中:

In [7]: pairs
Out[7]: 
array([[ 0.1  ,  0.2  ],
       [ 3.14 ,  2.71 ],
       [ 0.707,  0.577]])

pairspm [a, b, -a, -b] .

In [8]: pairspm = np.hstack((pairs, -pairs))

indices 索引是否为以下形式的数组 [a,b,-a,-b] 对应于中的8x2模式 shuffle(a, b) :

In [9]: indices = np.array([[0, 1], [2, 1], [0, 3], [2, 3], [1, 0], [3, 0], [1, 2], [3, 2]])

out ,然后重塑以折叠的前两个维度 pairspm[:, indices] 合二为一:

In [10]: out = pairspm[:, indices].reshape(-1, 2)

In [11]: out
Out[11]: 
array([[ 0.1  ,  0.2  ],
       [-0.1  ,  0.2  ],
       [ 0.1  , -0.2  ],
       [-0.1  , -0.2  ],
       [ 0.2  ,  0.1  ],
       [-0.2  ,  0.1  ],
       [ 0.2  , -0.1  ],
       [-0.2  , -0.1  ],
       [ 3.14 ,  2.71 ],
       [-3.14 ,  2.71 ],
       [ 3.14 , -2.71 ],
       [-3.14 , -2.71 ],
       [ 2.71 ,  3.14 ],
       [-2.71 ,  3.14 ],
       [ 2.71 , -3.14 ],
       [-2.71 , -3.14 ],
       [ 0.707,  0.577],
       [-0.707,  0.577],
       [ 0.707, -0.577],
       [-0.707, -0.577],
       [ 0.577,  0.707],
       [-0.577,  0.707],
       [ 0.577, -0.707],
       [-0.577, -0.707]])

Nick T twasbrillig 7 年前

def gen(pairs):
    out = np.empty((8 * len(pairs), 2), dtype=float)
    for n, (a, b) in enumerate(pairs):
        out.flat[16*n:16*(n+1)] = [
            +a, +b, -a, +b, +a, -b, -a, -b,
            +b, +a, -b, +a, +b, -a, -b, -a,
        ]
    return out

AGN Gazer 7 年前

以下是另一种在不堆叠单个阵列的情况下构建整个输出结果的方法:

import numpy as np
# generate some data:
pairs = np.random.randint(1, 100, (1000, 2))
# create "sign" array:
u = np.array([[[1, 1], [-1, 1], [1, -1], [-1, -1]]])
# create full output array:
out = (pairs[:, None, :] * u).reshape((-1, 2))

时间安排:

%timeit (pairs[:, None, :] * u).reshape((-1, 2))
10000 loops, best of 3: 49 Âµs per loop