代码之家  ›  专栏  ›  技术社区  ›  William

Pandas数据帧行不会下降

  •  0
  • William  · 技术社区  · 3 年前

    背景- 我正在尝试从pandas数据帧中删除行( exceptions_df )如果所有3x条件都满足。

    条件-

    1. Ownership Audit Note 列值包含的部分字符串值为 ignore Ignore
    2. Entity ID % 列值为== Account # % (此列的格式为 float64 )。
    3. % Ownership coumn是== 100 。(此列的格式为 float64 )

    从数据帧中提取-

      % Ownership     Ownership Audit Note       Entity ID %     Account # %  
    0 100.00          [ignore] 100% Ownership    0.0000000       0.0000000  
    1 100.00          [ignore] 100% Ownership    0.0000000       0.0000000  
    2 100.00          [ignore] 100% Ownership    0.0000000       0.0000000  
    3 100.00          [ignore] 100% Ownership    0.0000000       0.0000000    
    4 100.00          [ignore] 100% Ownership    0.0000000       0.0000000    
    5 100.00          [ignore] 100% Ownership    1.0000000       1.0000000  
    8 100.00          [ignore] 100% Ownership    0.0000234       0.0000234  
    9 100.00          [ignore] 100% Ownership    0.0000000       0.0000000
    

    我的代码-

    exceptions_df = exceptions_df[~exceptions_df['Ownership Audit Note'].str.contains('ignore'|'Ignore') & 
                                 [~exceptions_df['% Ownership'] == 100] & 
                                 [~exceptions_df['Account # %'] == 'Entity ID %']]
    

    问题- 我似乎得到了以下信息 TypeError: ,它引用了上面的代码行。我错过了什么显而易见的东西吗?奇怪的是,如果我只包含第一个条件/第一行代码,那么它就可以正常工作!

    TypeError: unsupported operand type(s) for |: 'str' and 'str'
    
    1 回复  |  直到 3 年前
        1
  •  0
  •   Matthew Borish    3 年前

    需要删除.contains()中的内引号。例如,Made dummy df。

    exceptions_dict = {'% Ownership': {0: 100.0,
      1: 100.0,
      2: 100.0,
      3: 100.0,
      4: 100.0,
      5: 100.0,
      6: 100.0,
      7: 100.0,
      8: 90.0,
      9: 100.0},
     'Ownership Audit Note': {0: '[ignore] 100% Ownership',
      1: '[ignore] 100% Ownership',
      2: '[ignore] 100% Ownership',
      3: '[ignore] 100% Ownership',
      4: '[ignore] 100% Ownership',
      5: '[ignore] 100% Ownership',
      6: '[ignore] 100% Ownership',
      7: '[ignore] 100% Ownership',
      8: 'foo',
      9: 'foo'},
     'Entity ID %': {0: 0.0,
      1: 0.0,
      2: 0.0,
      3: 0.0,
      4: 0.0,
      5: 1.0,
      6: 2.34e-05,
      7: 0.0,
      8: 1.0,
      9: 1.0},
     'Account # %': {0: 0.0,
      1: 0.0,
      2: 0.0,
      3: 0.0,
      4: 0.0,
      5: 1.0,
      6: 2.34e-05,
      7: 0.0,
      8: 2.0,
      9: 2.0}}
    
    exceptions_df = pd.DataFrame(exceptions_dict)
    
    exceptions_df = exceptions_df[(~(exceptions_df['Ownership Audit Note'].str.contains('ignore|Ignore'))) & 
                                    (~(exceptions_df['% Ownership'] == 100.0)) & 
                                    (~(exceptions_df['Account # %'] == 'Entity ID %'))]
    
    print(exceptions_df)
    
        % Ownership Ownership Audit Note    Entity ID % Account # %
    8   90.0        foo                     1.0       2.0
    
        2
  •  0
  •   wwnde    3 年前

    使用了错误的分区括号。让我们试试

    exceptions_df = exceptions_df[(~(exceptions_df['Ownership Audit Note'].str.contains('ignore'|'Ignore'))) & 
                                 (~(exceptions_df['% Ownership'] == 100)) & 
                                ( ~(exceptions_df['Account # %'] == 'Entity ID %'))]