首先,我认为您的测试数据有错误。如果我们按照您希望的方式对数据进行计数,您将看到大多数值的计数都是1,但接下来有两个值的计数都是2,接下来的两个值都没有。
VARIABLES count1 count2 count3
hour 19 22 19 22 19 22
ID INDEX
6549456 5 1.0 1.0 1.0 1.0 1.0 1.0
6 1.0 1.0 1.0 1.0 1.0 1.0
7 1.0 1.0 1.0 1.0 1.0 1.0
6549986 0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0 1.0 1.0
4 1.0 1.0 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0 1.0 1.0
6 1.0 1.0 1.0 1.0 1.0 1.0
7 1.0 1.0 1.0 1.0 1.0 1.0
6550692 0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 2.0 2.0 NaN NaN
4 1.0 1.0 1.0 1.0 1.0 1.0
5 1.0 1.0 1.0 1.0 1.0 1.0
6 1.0 1.0 1.0 1.0 1.0 1.0
7 1.0 1.0 1.0 1.0 1.0 1.0
但是,我们仍然可以对具有两个值的位置使用某种聚合来重塑数据。
df_out = df.groupby(['ID','INDEX','VARIABLES','hour'])['VALUE'].mean().unstack([-2,-1])
â
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
print(df_out.reset_index())
输出:
ID INDEX count1_19 count1_22 count2_19 count2_22 count3_19 count3_22
0 6549456 5 164663.0 164663.0 164663.0 164663.0 164663.0 164663.0
1 6549456 6 4640344.0 4640344.0 4640344.0 4640344.0 4640344.0 4640344.0
2 6549456 7 0.0 0.0 0.0 0.0 0.0 0.0
3 6549986 0 265705.0 265705.0 265705.0 265705.0 265705.0 265705.0
4 6549986 1 1016836.0 1016836.0 1016836.0 1016836.0 1016836.0 1016836.0
5 6549986 2 0.0 0.0 0.0 0.0 0.0 0.0
6 6549986 3 5047.0 5047.0 5047.0 5047.0 5047.0 5047.0
7 6549986 4 0.0 0.0 0.0 0.0 0.0 0.0
8 6549986 5 0.0 0.0 0.0 0.0 0.0 0.0
9 6549986 6 18661246.0 18661246.0 18661246.0 18661246.0 18661246.0 18661246.0
10 6549986 7 0.0 0.0 0.0 0.0 0.0 0.0
11 6550692 0 1500517.0 1500517.0 1500517.0 1500517.0 1500517.0 1500517.0
12 6550692 1 34513980.0 34513980.0 34513980.0 34513980.0 34513980.0 34513980.0
13 6550692 2 0.0 0.0 0.0 0.0 0.0 0.0
14 6550692 3 230993.0 230993.0 230993.0 230993.0 NaN NaN
15 6550692 4 156.0 156.0 156.0 156.0 156.0 156.0
16 6550692 5 0.0 0.0 0.0 0.0 0.0 0.0
17 6550692 6 14958246.0 14958246.0 14958246.0 14958246.0 14958246.0 14958246.0
18 6550692 7 0.0 0.0 0.0 0.0 0.0 0.0