代码之家 › 专栏 › 技术社区 › PeptideWitch

熊猫:用另一个数据帧的多个同时列值过滤一个数据帧

pandas python

PeptideWitch · 技术社区 · 3 年前

我有一个经过过滤的数据帧,叫做 correct_df 还有一个原始数据帧 example_df .

example_df = pd.DataFrame({'Test': ['Test_1', 'Test_1', 'Test_1', 'Test_2', 'Test_2', 'Test_2', 'Test_3', 'Test_3', 'Test_3'], 'A': [1, 2, 3, 1, 2, 3, 1, 2, 3]})
other_df = pd.DataFrame({'Test': ['Test_1', 'Test_1', 'Test_3', 'Test_3'], 'A': [1, 2, 1, 3]})

预期结果:

我想要 示例_df 其中“Test”和“A”列值都与 正确_df .

我试过:

result = example_df.loc[ (example_df['Test'].isin(other_df['Test'])) & (example_df['A'].isin(other_df['A'])) ]
result

    Test    A
0   Test_1  1
1   Test_1  2
2   Test_1  3
6   Test_3  1
7   Test_3  2
8   Test_3  3

但是,由于这两个条件是分开的,结果值只应用于单个列上的条件,而不链接它们,即A,然后也是B,而不是A和B。我如何获得。两列条件的位置?

1 回复 | 直到 3 年前

jezrael 3 年前

使用 DataFrame.reset_index 为了避免损失指数 DataFrame.merge :

result = example_df.reset_index().merge(other_df, on=['Test','A'])
print (result)
   index    Test  A
0      0  Test_1  1
1      1  Test_1  2
2      6  Test_3  1
3      8  Test_3  3

result = (example_df.reset_index()
                    .merge(other_df, on=['Test','A'])
                    .set_index('index')
                    .rename_axis(None))
print (result)
     Test  A
0  Test_1  1
1  Test_1  2
6  Test_3  1
8  Test_3  3

另一个关于 MultiIndex 具有 Index.isin 然后过滤进来 boolean indexing :

result = example_df[example_df.set_index(['Test','A']).index
                              .isin(other_df.set_index(['Test','A']).index)]
print (result)
     Test  A
0  Test_1  1
1  Test_1  2
6  Test_3  1
8  Test_3  3

enke 3 年前

@耶斯雷尔的解决方案绝对适合你的问题。这只是使用numpy获得相同结果的另一种方法(有点复杂)。

我们可以过滤 example_df 直接使用一个布尔数组,我们可以通过检查 示例_df 存在 other_df .要做到这一点,我们需要 示例_df 一个3D阵列,并使用numpy广播与 其他 .然后使用 all 和 any 将其缩小到1D阵列 msk :

msk = (example_df.to_numpy()[:, None]==other_df.to_numpy()).all(axis=2).any(axis=1)
out = example_df[msk]

输出:

     Test  A
0  Test_1  1
1  Test_1  2
6  Test_3  1
8  Test_3  3

推荐文章

Mainland · Python数据帧规范化值错误:列的长度必须与键相同

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

rpn · 如何在列[1]中连续第二次出现“0”时返回列[0]的值

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前

Gtoth · 如何分割Pandas DataFrame中包含多个日期的两个时间戳之间的差异

1 年前

Domarius · 使用loc为多行设置多列值

1 年前

Swastik Bhattacharyya · 如何在同一类别类型的多列上运行get_dummies()函数?

1 年前

DrZoidberg09 · 如何在字典列表中创建一个新关键字,该关键字是另一个关键字的总和?

1 年前

armstrong3701 · 如何有效地处理熊猫数据框中缺失的数据并计算条件统计?

1 年前

msts1906 · 大熊猫向乳胶的适当多品种出口

1 年前