我想你需要
GroupBy.transform
mean
对于
Series
与原版尺寸相同
DataFrame
,因此可以将列减去
Series.sub
:
s = self.data_train.groupby('userId')['rating'].transform('mean')
self.data_train['new'] = self.data_train['rating'].sub(s)
样品
userId
为了更好的样品
print (data_train)
userId movieId rating timestamp
65414 466 608 4.0 945139883
79720 466 6218 4.0 1089518106
63354 457 4007 3.5 1471383787
29923 466 59333 2.5 1462636955
63651 457 102194 2.5 1471383710
s = data_train.groupby('userId')['rating'].transform('mean')
print (s)
65414 3.5
79720 3.5
63354 3.0
29923 3.5
63651 3.0
Name: rating, dtype: float64
data_train['new'] = data_train['rating'].sub(s)
print (data_train)
userId movieId rating timestamp new
65414 466 608 4.0 945139883 0.5
79720 466 6218 4.0 1089518106 0.5
63354 457 4007 3.5 1471383787 0.5
29923 466 59333 2.5 1462636955 -1.0
63651 457 102194 2.5 1471383710 -0.5