代码之家  ›  专栏  ›  技术社区  ›  user3476463

从列表创建伪变量

  •  0
  • user3476463  · 技术社区  · 5 年前

    Lst=[‘loan’,’Borrower’,’debts’]
    

    如果Notes列中的字符串包含列表中的每个条目,则该Id类似于为其创建一个二进制标志。有人能建议怎么做吗?

    数据:

    print(data_df[['Id','Notes']][:10])
    
         Id                                              Notes
    59    60   568549 added on 11/04/09 > I use my current l...     
    76    77  I would like to use this loan to consolidate c...
    88    89    Borrower added on 06/28/10 > I would really ...
    229  230  I just got married and ran up some debt during...
    

    输出:

         Id                                              Notes      loan        Borrower        debts
    59    60   568549 added on 11/04/09 > I use my current l...     0       0           0
    76    77  I would like to use this loan to consolidate c...     1       0           0
    88    89    Borrower added on 06/28/10 > I would really ...     0       1           0
    229  230  I just got married and ran up some debt during...     0       0           1
    
    1 回复  |  直到 5 年前
        1
  •  1
  •   BENY    5 年前

    核对 str.findall 然后 get_dummies

    df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()
    Out[639]: 
       Borrower  debts  loan
    0         0      0     1
    1         1      0     0
    2         0      1     0
    yourdf=pd.concat([df,df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()],axis=1)
    yourdf
    Out[640]: 
                Note  Borrower  debts  loan
    0       loan lll         0      0     1
    1  llll Borrower         1      0     0
    2    ......debts         0      1     0
    

    df=pd.DataFrame({'Note':['loan lll','llll Borrower','......debts']})
    
        2
  •  0
  •   John Ketterer    5 年前

    <dataframe>['new column name'] = <dataframe>['some existing column name'].apply(<some function>)
    

    在你的情况下更具体地说:

    data_df['loan'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('loan') else 0)
    data_df['Borrower'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('Borrower') else 0)
    data_df['debt'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('debt') else 0)