代码之家 › 专栏 › 技术社区 › colt.exe

python中嵌套循环比较的加速

loops python

-1

colt.exe · 技术社区 · 4 年前

arr1 和 arr2 有尺寸的 (90000,1) 和 (120000,1) . 如果我想找出任何元素 axis=0 属于 arr1型 arr2型 . 然后把他们的位置写在一个列表上,然后删除。这将确保在另一个列表中找不到任何一个元素。现在,我正在使用 for 循环:

list_conflict=[]
for i in range (len(arr1)):
    for j in range (len(arr2)):
        if (arr1[i]==arr2[j]):
            list_conflict.append([i,j])

fault_index_pos = np.unique([x[0] for x in list_conflict])
fault_index_neg = np.unique([x[1] for x in list_conflict])

X_neg = np.delete(X_neg,fault_index_neg,axis=0)
X_pos = np.delete(X_pos,fault_index_pos,axis=0)

并将其与 arr2型 list_conflict 第一个元素是 arr1型 arr2型 . 那么 fault_index_pos 和 fault_index_neg 被压缩成独特的元素,因为 arr1型 可能在多个地方 np.delete 通过采取 fault_index

我在寻找一种更快的冲突比较方法叫它 multiprocessing , vectorization 或者别的什么。你可以说这不会花太多时间,但实际上数组已经在 (x,8,10) 但为了清晰起见,我缩短了尺寸。

1 回复 | 直到 4 年前

Tim Peters 4 年前

忽略numpy部分,在纯Python中可以更快地找到冲突的索引对,花费的时间与 len(a) len(b) 产品

def conflicts(a, b):
    from collections import defaultdict
    elt2ix = defaultdict(list)
    for i, elt in enumerate(a):
        elt2ix[elt].append(i)
    for j, elt in enumerate(b):
        if elt in elt2ix:
            for i in elt2ix[elt]:
                yield i, j

然后,例如。,

for pair in conflicts([1, 2, 4, 5, 2], [2, 3, 8, 4]):
    print(pair)

显示器

(1, 0)
(4, 0)
(2, 3)

它们是2和4匹配出现的索引。

Trenton McKinney ivirshup 4 年前

需要4行代码

import numpy as np
import pandas as pd

# create test data
np.random.seed(1)
a = np.random.randint(10, size=(10, 1))
np.random.seed(1)
b = np.random.randint(8, 15, size=(10, 1))

# create dataframe
df_a = pd.DataFrame(a)
df_b = pd.DataFrame(b)

# find unique values in df_a
unique_a = df_a[0].unique().tolist()

# create a Boolean mask and return only values of df_b not found in df_a
values_not_in_a = df_b[~df_b[0].isin(unique_a)].to_numpy()

a = array([[5],
           [8],
           [9],
           [5],
           [0],
           [0],
           [1],
           [7],
           [6],
           [9]])

b = array([[13],
           [11],
           [12],
           [ 8],
           [ 9],
           [11],
           [13],
           [ 8],
           [ 8],
           [ 9]])

# final output array
values_not_in_a = array([[13],
                         [11],
                         [12],
                         [11],
                         [13]])

只使用numpy

import numpy

# create test data
np.random.seed(1)
a = np.random.randint(10, size=(10, 1))
np.random.seed(1)
b = np.random.randint(8, 15, size=(10, 1))

ua = np.unique(a)  # unique values of a
ub = np.unique(b)  # unique values of b

mask_b = np.isin(b, ua, invert=True)
mask_a = np.isin(a, ub, invert=True)

b_values_not_in_a = b[mask_b]
a_values_not_in_b = a[mask_a]

# b_values_not_in_a
array([13, 11, 12, 11, 13])

# a_values_not_in_b
array([5, 5, 0, 0, 1, 7, 6])

`timeit`

# using the following arrays
np.random.seed(1)
a = np.random.randint(10, size=(90000, 1))
np.random.seed(1)
b = np.random.randint(8, 15, size=(120000, 1))

%%timeit
5.6 ms Â± 151 Âµs per loop (mean Â± std. dev. of 7 runs, 100 loops each)

Prune 4 年前

请学习一些关于NumPy的向量功能以及Python的序列包含操作符的教程。您正在尝试编写一个大型应用程序,它非常需要尚未学会的语言工具。

set 然后在设定的十字路口。涉及的操作是一个序列/一组 N 元素;嵌套循环是 (在两个序列大小上)。

0x5453 Yuki 4 年前

正如@Prune建议的,这里有一个解决方案 set s:

overlap = np.array(list(set(arr1) & set(arr2)))  # Depending on array shapes you may need to flatten or slice first
arr1 = arr1[~np.isin(arr1, overlap)]
arr2 = arr2[~np.isin(arr2, overlap)]