代码之家 › 专栏 › 技术社区 › sacuL

基于对重置索引执行的计算创建列时出现意外行为

pandas python

sacuL · 技术社区 · 6 年前

回答时 this question ,我在尝试基于对数据帧索引所做的基本计算创建列时遇到了我认为是意外行为的情况。 我不是真的在寻找解决方案,而是试图找出 为什么? 这是真的 . 我可能忽略了一些基本的东西…

设置:

np.random.seed(42)

df = pd.DataFrame(np.random.randint(0,5,9), index=[0,1,2,0,1,2,0,1,2])

>>> df
   0
0  3
1  4
2  2
0  4
1  4
2  1
0  2
1  2
2  2

奇怪的行为:

假设我试图得到指数等于0的累积和。我可以很容易地做到这一点:

>>> df.reset_index()['index'].eq(0).cumsum()
0    1
1    1
2    1
3    2
4    2
5    2
6    3
7    3
8    3
Name: index, dtype: int64

但是,如果我试图将此项直接分配给一个新列,则结果不正确:

df['new_column'] = df.reset_index()['index'].eq(0).cumsum()

>>> df
   0  new_column
0  3           1
1  4           1
2  2           1
0  4           1
1  4           1
2  1           1
0  2           1
1  2           1
2  2           1

如果我用 assign :

df.assign(new_column = df.reset_index()['index'].eq(0).cumsum())

预期行为:

我本以为结果会是:

>>> df
   0  new_column
0  3           1
1  4           1
2  2           1
3  4           2
4  4           2
5  1           2
6  2           3
7  2           3
8  2           3

解决方法:

有很多解决方法,例如:

df = df.reset_index().rename(columns={'index':'tmp'})

df['new_column'] = df.tmp.eq(0).cumsum()

df.drop('tmp', axis=1, inplace=True)

或

df.loc[0,'new_column'] = 1

df['new_column'] = df['new_column'].fillna(0).cumsum().astype(int)

问题:

但正如我所说,我只对 当我直接从 reset_index()

谢谢你的意见!

2 回复 | 直到 6 年前

rafaelc 6 年前

1

df

Sheldore 6 年前

df['new_column'] = df.reset_index()['index'].eq(0).cumsum().values

.values pandas.core.series.Series

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前