代码之家 › 专栏 › 技术社区 › user3476463

从列表创建伪变量

pandas python-3.x

user3476463 · 技术社区 · 5 年前

Lst=[âloanâ,âBorrowerâ,âdebtsâ]

如果Notes列中的字符串包含列表中的每个条目,则该Id类似于为其创建一个二进制标志。有人能建议怎么做吗?

数据:

print(data_df[['Id','Notes']][:10])

     Id                                              Notes
59    60   568549 added on 11/04/09 > I use my current l...     
76    77  I would like to use this loan to consolidate c...
88    89    Borrower added on 06/28/10 > I would really ...
229  230  I just got married and ran up some debt during...

输出:

     Id                                              Notes      loan        Borrower        debts
59    60   568549 added on 11/04/09 > I use my current l...     0       0           0
76    77  I would like to use this loan to consolidate c...     1       0           0
88    89    Borrower added on 06/28/10 > I would really ...     0       1           0
229  230  I just got married and ran up some debt during...     0       0           1

1 回复 | 直到 5 年前

BENY 5 年前

核对 str.findall 然后 get_dummies

df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()
Out[639]: 
   Borrower  debts  loan
0         0      0     1
1         1      0     0
2         0      1     0
yourdf=pd.concat([df,df.Note.str.findall('|'.join(Lst)).str[0].str.get_dummies()],axis=1)
yourdf
Out[640]: 
            Note  Borrower  debts  loan
0       loan lll         0      0     1
1  llll Borrower         1      0     0
2    ......debts         0      1     0

df=pd.DataFrame({'Note':['loan lll','llll Borrower','......debts']})

John Ketterer 5 年前

<dataframe>['new column name'] = <dataframe>['some existing column name'].apply(<some function>)

在你的情况下更具体地说:

data_df['loan'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('loan') else 0)
data_df['Borrower'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('Borrower') else 0)
data_df['debt'] = data_df.Notes.apply(lambda x: 1 if x.str.contains('debt') else 0)