代码之家  ›  专栏  ›  技术社区  ›  Masterbuilder

根据id给dataframe打分

  •  1
  • Masterbuilder  · 技术社区  · 6 年前

    我有一个按日期索引的数据框,我试图根据类别为每个accountid提供分数,如果索引日期上存在该类别值,该数据框将如下所示。

         accountid category Smooth Hard Sharp Narrow
    timestamp                                             
    2018-03-29       101   Smooth    1  NaN   NaN    NaN
    2018-03-29       102     Hard    NaN  1   NaN    NaN
    2018-03-30       103   Narrow    NaN  NaN   NaN    1
    2018-04-30       104    Sharp    NaN  NaN   1    NaN
    2018-04-21       105   Narrow    NaN  NaN   NaN    1
    

    循环每个accountid的数据帧并为每个未堆叠的类别分配分数的最佳方法是什么。

    下面是数据帧创建脚本。

    import pandas as pd
    import datetime
    idx = pd.date_range('02-28-2018', '04-29-2018')
    
    df = pd.DataFrame(
        [[ '101', '2018-03-29', 'Smooth','NaN','NaN','NaN','NaN'], [
             '102', '2018-03-29', 'Hard','NaN','NaN','NaN','NaN'
        ], [ '103', '2018-03-30', 'Narrow','NaN','NaN','NaN','NaN'], [
             '104', '2018-04-30', 'Sharp','NaN','NaN','NaN','NaN'
        ], [ '105', '2018-04-21', 'Narrow','NaN','NaN','NaN','NaN']],
        columns=[ 'accountid', 'timestamp', 'category','Smooth','Hard','Sharp','Narrow'])
    
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df=df.set_index(['timestamp'])
    print(df)
    
    1 回复  |  直到 6 年前
        1
  •  0
  •   Scott Boston    6 年前

    您可以将str访问器用于 get_dummies :

    df[['accountid','category']].assign(**df['category'].str.get_dummies())
    

    输出:

               accountid category  Hard  Narrow  Sharp  Smooth
    timestamp                                                 
    2018-03-29       101   Smooth     0       0      0       1
    2018-03-29       102     Hard     1       0      0       0
    2018-03-30       103   Narrow     0       1      0       0
    2018-04-30       104    Sharp     0       0      1       0
    2018-04-21       105   Narrow     0       1      0       0
    

    并将0替换为nan,

    df[['accountid','category']].assign(**df['category'].str.get_dummies())\
                                .replace(0,np.nan)
    

    输出:

               accountid category  Hard  Narrow  Sharp  Smooth
    timestamp                                                 
    2018-03-29       101   Smooth   NaN     NaN    NaN     1.0
    2018-03-29       102     Hard   1.0     NaN    NaN     NaN
    2018-03-30       103   Narrow   NaN     1.0    NaN     NaN
    2018-04-30       104    Sharp   NaN     NaN    1.0     NaN
    2018-04-21       105   Narrow   NaN     1.0    NaN     NaN