代码之家  ›  专栏  ›  技术社区  ›  steff

根据复杂的条件查找pd.df中的行

  •  0
  • steff  · 技术社区  · 6 年前

                        code       date type  strike  settlement
    id                                                          
    1195001   CBT_21_G2012_S 2012-01-04    P  101.50    0.015625
    1195093   CBT_21_G2012_S 2012-01-04    C  101.50   28.890625
    1194926   CBT_21_G2012_S 2012-01-04    C  102.00   28.390625
    1194944   CBT_21_G2012_S 2012-01-04    C  102.50   27.906250
    1195109   CBT_21_G2012_S 2012-01-04    P  102.50    0.015625
    1194905   CBT_21_G2012_S 2012-01-04    C  103.00   27.406250
    1195008   CBT_21_G2012_S 2012-01-04    P  103.50    0.015625
    1195123   CBT_21_G2012_S 2012-01-04    C  103.50   26.906250
    1194908   CBT_21_G2012_S 2012-01-04    C  104.00   26.390625
    1194980   CBT_21_G2012_S 2012-01-04    C  104.50   25.890625
    1195025   CBT_21_G2012_S 2012-01-04    P  104.50    0.015625
    1194981   CBT_21_G2012_S 2012-01-04    P  105.00    0.015625
    1195063   CBT_21_G2012_S 2012-01-04    C  105.00   25.390625
    1194960   CBT_21_G2012_S 2012-01-04    C  105.50   24.890625
    1195102   CBT_21_G2012_S 2012-01-04    P  105.50    0.015625
    1194989   CBT_21_G2012_S 2012-01-04    C  106.00   24.390625
    

    我需要找到只有type='p'或type='c'用于相同代码、日期和删除的行。

                        code       date type  strike  settlement
    id                                                          
    1194926   CBT_21_G2012_S 2012-01-04    C  102.00   28.390625
    1194905   CBT_21_G2012_S 2012-01-04    C  103.00   27.406250
    1194908   CBT_21_G2012_S 2012-01-04    C  104.00   26.390625
    1194989   CBT_21_G2012_S 2012-01-04    C  106.00   24.390625
    

    [编辑]

    1 回复  |  直到 6 年前
        1
  •  1
  •   jezrael    6 年前

    transform 具有 nunique 1 eq == boolean indexing 以下内容:

    #if exist multiple types
    #df = df[df['type'].isin(['C','P'])]
    
    df = df[df.groupby(['code', 'date', 'strike'])['type'].transform('nunique').eq(1)]
    print (df)
                       code        date type  strike  settlement
    id                                                          
    1194926  CBT_21_G2012_S  2012-01-04    C   102.0   28.390625
    1194905  CBT_21_G2012_S  2012-01-04    C   103.0   27.406250
    1194908  CBT_21_G2012_S  2012-01-04    C   104.0   26.390625
    1194989  CBT_21_G2012_S  2012-01-04    C   106.0   24.390625
    

    print (df.groupby(['code', 'date', 'strike'])['type'].transform('nunique'))
    id
    1195001    2
    1195093    2
    1194926    1
    1194944    2
    1195109    2
    1194905    1
    1195008    2
    1195123    2
    1194908    1
    1194980    2
    1195025    2
    1194981    2
    1195063    2
    1194960    2
    1195102    2
    1194989    1
    Name: type, dtype: int64
    

    map

    df['type'] = df['type'].map({'C':'P', 'P':'C'})
    print (df)
                       code        date type  strike  settlement
    id                                                          
    1194926  CBT_21_G2012_S  2012-01-04    P   102.0   28.390625
    1194905  CBT_21_G2012_S  2012-01-04    P   103.0   27.406250
    1194908  CBT_21_G2012_S  2012-01-04    P   104.0   26.390625
    1194989  CBT_21_G2012_S  2012-01-04    P   106.0   24.390625