代码之家 › 专栏 › 技术社区 › YOLO

在数据帧上计算模式,而不排序结果

mode dataframe pandas python

YOLO · 技术社区 · 7 年前

我有这样一个数据框:

df = pd.DataFrame({'a1': [2,3,4,8,8], 'a2': [2,5,7,5,10], 'a3':[1,9,4,10,2]})

    a1  a2  a3
0   2   2   1
1   3   5   9
2   4   7   4
3   8   5   10
4   8   10  2

输出应为:

方法:我想按行计算模式,如果模式不存在,我想从a1(第一列)中得到值。

例如:第二行 (3,5,9) ,模式不存在,因此 3 在输出中。

注意:我已经试过了 `df.mode(axis=1)` 但这似乎会按行排列值序列,所以我并不总是得到输出中第一列的值。

2 回复 | 直到 4 年前

cs95 abhishek58g 7 年前

无排序方法

agg + collections.Counter 。 不排序模式 。

from collections import Counter
df.agg(lambda x: Counter(x).most_common(1)[0][0], axis=1)

0    2
1    3
2    4
3    8
4    8
dtype: int64

模式排序方法

使用 mode 沿着第一个轴,然后选择先到的:

df.mode(axis=1).iloc[:, 0]

或

df.mode(axis=1)[0]

0    2.0
1    3.0
2    4.0
3    5.0
4    2.0
Name: 0, dtype: float64

scipy.stats.mode

from scipy.stats import mode
np.array(mode(df, axis=1))[0].squeeze()
array([2, 3, 4, 5, 2])

DYZ 7 年前

还有一个选择是使用 np.where :

mode = df.mode(axis=1)
np.where(mode.iloc[:,-1].isnull(),
    mode.iloc[:,0], # No tie, use the calculated mode 
    df.iloc[:,0]) # Tie, use the first column of the original df
# array([2., 3., 4., 8., 8.])

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前

在数据帧上计算模式,而不排序结果

注意:我已经试过了 df.mode(axis=1) 但这似乎会按行排列值序列,所以我并不总是得到输出中第一列的值。

注意:我已经试过了 `df.mode(axis=1)` 但这似乎会按行排列值序列,所以我并不总是得到输出中第一列的值。