代码之家  ›  专栏  ›  技术社区  ›  Jane Sully

如何在单个列上使用groupby并在Pandas中对多个列执行比较?

  •  2
  • Jane Sully  · 技术社区  · 6 年前

    Users    |    Signed_up    |     Prediction   |
    User1         1                  0            
    User2         0                  0
    User1         1                  1
    User3         1                  1
    User2         0                  1
    User2         0                  0
    ...
    
    For TP, the resulting table might look something like:
    
    Users    |    TP    |
    User1         1
    User2         0
    User3         1
    
    For TN, the resulting table might look something like:
    Users    |    TN    |
    User1         0
    User2         1
    User3         0
    
    and so on for FP and FN.
    

    我想我是在网络上群比的 Users 列,并使用lambda函数比较 Sign_up Prediction 列,但我不知道如何实际做到这一点。我将感谢任何帮助!

    3 回复  |  直到 6 年前
        1
  •  4
  •   ALollz    6 年前

    在你之前做比较 groupby 然后 子句 + sum

    (df.assign(TP = df.Signed_up & df.Prediction, 
               TN = (df.Signed_up == 0) & (df.Prediction == 0),
               FN = df.Signed_up & (df.Prediction == 0), 
               FP = (df.Signed_up == 0) & df.Prediction)
       .groupby('Users')['TP', 'TN', 'FN', 'FP'].sum())
    
           TP   TN   FN   FP
    Users                   
    User1   1  0.0  1.0  0.0
    User2   0  2.0  0.0  1.0
    User3   1  0.0  0.0  0.0
    

    受@BrianJoseph的启发,只需输入更少的内容,您就可以 所有3列,确定大小,并取消堆叠除用户以外的所有内容:

    df.groupby([*df]).size().unstack([1,2]).fillna(0)
    
    Signed_up     1         0     
    Prediction    0    1    0    1
    Users                         
    User1       1.0  1.0  0.0  0.0
    User2       0.0  0.0  2.0  1.0
    User3       0.0  1.0  0.0  0.0
    
        2
  •  3
  •   Brian    6 年前

    Signed_up Prediction . 您可以这样对它们进行分类:

    grps = df.groupby(lambda index: (df.loc[index, 'Signed_up'], df.loc[index, 'Prediction']))
    

    tp_df = grps.get_group((1,1))
    
        3
  •  2
  •   d_kennetz    6 年前

    & 位运算符。 & 表示必须满足这两个条件才能返回值,因此:

    df = pd.read_csv('./Desktop/models.csv')
    
    TP = df.loc[(df['Signed_up'] == 1) & (df['Prediction'] == 1)]
    
    TN = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 0)]
    
    FN = df.loc[(df['Signed_up'] == 1) & (df['Prediction'] == 0)]
    
    FP = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 1)]
    

    输出:

    >>> TP
       Users  Signed_up  Prediction
    2  User1          1           1
    3  User3          1           1
    >>> TN = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 0)]
    >>> TN
       Users  Signed_up  Prediction
    1  User2          0           0
    5  User2          0           0
    >>> FN = df.loc[(df['Signed_up'] == 1) & (df['Prediction'] == 0)]
    >>> FN
       Users  Signed_up  Prediction
    0  User1          1           0
    >>> FP = df.loc[(df['Signed_up'] == 0) & (df['Prediction'] == 1)]
    >>> FP
       Users  Signed_up  Prediction
    4  User2          0           1