我有以下数据框:
import pandas as pd
df = pd.DataFrame({"E": ["X", "Y", "X", "X", "Y", "X"], "F": ["Y", "Y", "X", "Y", "X","Y"], "G": ["Y", "X", "X", "X", "Y", "X"], "I": ["A", "B", "B", "B", "A", "A"]})
df.set_index("I", drop = True, inplace = True)
print(df)
E F G
I
A X Y Y
B Y Y X
B X X X
B X Y X
A Y X Y
A X Y X
我现在要计算每个组合的出现次数
A-X, A-Y, B-X, B-Y
每列
E, F, G
,因此预期输出为:
E F G
X Y X Y X Y
I
A 2 1 1 2 1 2
B 2 1 1 2 3 0
我知道我可以用
pd.crosstab
. 所以我可以遍历列并连接数据帧:
for i, column in enumerate(df.columns):
if i == 0:
df1 = pd.crosstab(df.index, df[column])
else:
df1 = pd.concat([df1, pd.crosstab(df.index, df[column])], axis = 1)
但是,除了迭代感觉不可靠而且我希望有更好的解决方案之外,索引还丢失了有关初始列的信息:
X Y X Y X Y
row_0
A 2 1 1 2 1 2
B 2 1 1 2 3 0
实现正确输出的方法是什么?