代码之家  ›  专栏  ›  技术社区  ›  Shuvayan Das

在python中查找一个变量在另一个变量的每十分之一内的比例

  •  0
  • Shuvayan Das  · 技术社区  · 6 年前

    我有以下数据集:

    HID     Score   Decile_Name Result
    2089    62      4th decile  1
    897     47      2nd decile  0
    85      55      3rd decile  0
    8       74      7th decile  1
    23      31      1st decile  1
    5657    77      8th decile  1
    52      85      9th decile  0
    781     63      6th decile  0
    565     42      1st decile  0
    456     62      4th decile  1
    12      89      10th decile 1
    56      85      9th decile  1
    
    #Create a DataFrame
    df1 = {
         'HID':[2089,897,85,8,23,5657,52,781,565,456,12,56],
        'Score':[62,74,31,77,85,63,42,62,89,85],
        'Decile_Name':['4th decile','7th decile','1st decile','8th decile','9th decile','6th decile','1st decile','4th decile','10th decile','9th decile'],
        'Result' :[1,1,1,1,0,0,0,1,1,1]
    ]}
    
    
    
    df1 = pd.DataFrame(df1,columns=['HID','Score','Decile_Name','Result'])
    

    这为每个学生捕获一个主题中的分数和相应的分数十分之一。它还捕获学生是否通过或失败(结果)

    我要计算结果在每个十分位数(结果%)和总体(在整个数据集中)内的比例=1。预期输出:

    Attribute Level         Result %    num_of_stu  
    Score - All Categories  0.5         12 # This captures the values for the whole df(df1).
    Score - 1st Decile      0.5         2
    Score - 2nd Decile      0           1
    Score - 3rd Decile      0           1
    ...
    Score - 9th Decile      0.5         2
    Score - 10th Decile     1           1
    

    有人能帮我吗?

    2 回复  |  直到 6 年前
        1
  •  1
  •   jezrael    6 年前

    解决方案如果 0 1 仅在 Result 专栏:

    第一次合计 agg ,然后按整数对索引值排序 extract 具有 argsort ,创建新的摘要数据框并 append 它:

    df1 = df.groupby('Decile_Name').agg({'Result':'mean', 'HID':'size'})
    df1 = df1.iloc[df1.index.str.extract('(\d+)', expand=False).astype(int).argsort()]
    
    df2 = pd.DataFrame({'Result': [df['Result'].mean()],
                        'HID': [len(df)]}, index=['All Categories'])
    
    d = {'Result':'Result %','HID':'num_of_stu'}
    df1 = df2.append(df1).rename(columns=d)
    print (df1)
                    Result %  num_of_stu
    All Categories  0.583333          12
    1st decile      0.500000           2
    2nd decile      0.000000           1
    3rd decile      0.000000           1
    4th decile      1.000000           2
    6th decile      0.000000           1
    7th decile      1.000000           1
    8th decile      1.000000           1
    9th decile      0.500000           2
    10th decile     1.000000           1
    

    常规解决方案-仅为 价值观:

    df['Result1'] = df['Result'] == 1
    df1 = df.groupby('Decile_Name').agg({'Result1':'mean', 'HID':'size'})
    df1 = df1.iloc[df1.index.str.extract('(\d+)', expand=False).astype(int).argsort()]
    
    df2 = pd.DataFrame({'Result1': [df['Result1'].mean()],
                      'HID': [len(df)]}, index=['All Categories'])
    
    d = {'Result1':'Result %','HID':'num_of_stu'}
    df1 = df2.append(df1).rename(columns=d)
    print (df1)
                    Result %  num_of_stu
    All Categories  0.583333          12
    1st decile      0.500000           2
    2nd decile      0.000000           1
    3rd decile      0.000000           1
    4th decile      1.000000           2
    6th decile      0.000000           1
    7th decile      1.000000           1
    8th decile      1.000000           1
    9th decile      0.500000           2
    10th decile     1.000000           1
    
        2
  •  0
  •   Florian H    6 年前
    #build mean of Results grouped by Decile Name
    result_df = df1[['Decile_Name','Result']].groupby(['Decile_Name']).mean()
    
    #build count of Students grouped by Decile Name
    students_df = df1[['Decile_Name','HID']].groupby(['Decile_Name']).count()
    
    #merge the two dataframes
    merged_df = pd.concat([result_df, students_df], axis=1)
    
    #Add the sum for all studends as Index "All Students"
    merged_df.loc["All Studends"] = [df1[['Result']].mean()["Result"], df1[['HID']].count()["HID"]]
    
    #print 
    print(merged_df)
    

    结果:

                     Result     HID
    Decile_Name         
    10th decile     1.000000    1.0
    1st decile  0.500000    2.0
    2nd decile  0.000000    1.0
    3rd decile  0.000000    1.0
    4th decile  1.000000    2.0
    6th decile  0.000000    1.0
    7th decile  1.000000    1.0
    8th decile  1.000000    1.0
    9th decile  0.500000    2.0
    All Studends    0.583333    12.0