代码之家 › 专栏 › 技术社区 › Karn Kumar

分层索引按列排序,同时保留原始列

pandas python-3.x

Karn Kumar · 技术社区 · 6 年前

我正试图模拟一个Hirarical索引数据帧,如下所示:

>>> raw_data = ({'city': ['Delhi', 'Kanpur', 'Mumbai', 'Pune','Delhi', 'Kanpur', 'Mumbai', 'Pune'],
...                 'rank': ['1st', '2nd', '1st', '2nd','1st', '2nd', '1st', '2nd'],
...                 'name': ['Ramesh', 'Kirpal', 'Jungi', 'Sanju','Ramesh', 'Kirpal', 'Jungi', 'Sanju'],
...                 'score1': [10,15,20,25,10,15,20,25],
...                 'score2': [20,35,40,45,20,35,40,45]})

下面是DataFrame的样子,所以datFrame附带了默认索引。

>>> df = pd.DataFrame(raw_data, columns = ['city', 'rank', 'name', 'score1', 'score2'])
>>> df
     city rank    name  score1  score2
0   Delhi  1st  Ramesh      10      20
1  Kanpur  2nd  Kirpal      15      35
2  Mumbai  1st   Jungi      20      40
3    Pune  2nd   Sanju      25      45
4   Delhi  1st  Ramesh      10      20
5  Kanpur  2nd  Kirpal      15      35
6  Mumbai  1st   Jungi      20      40
7    Pune  2nd   Sanju      25      45

我想通过选择 'city', 'rank' set.index 方法,同时保留原始列。

>>> df.set_index(['city', 'rank'], drop=False)
               city rank    name  score1  score2
city   rank
Delhi  1st    Delhi  1st  Ramesh      10      20
Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
Mumbai 1st   Mumbai  1st   Jungi      20      40
Pune   2nd     Pune  2nd   Sanju      25      45
Delhi  1st    Delhi  1st  Ramesh      10      20
Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
Mumbai 1st   Mumbai  1st   Jungi      20      40
Pune   2nd     Pune  2nd   Sanju      25      45

但我希望有索引 city 首先,然后按索引 rank

               city rank    name  score1  score2
city   rank
Delhi  1st    Delhi  1st  Ramesh      10      20
       1st    Delhi  1st  Ramesh      10      20

Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
       2nd   Kanpur  2nd  Kirpal      15      35

Mumbai 1st   Mumbai  1st   Jungi      20      40
       1st   Mumbai  1st   Jungi      20      40

Pune   2nd     Pune  2nd   Sanju      25      45
       2nd     Pune  2nd   Sanju      25      45

1 回复 | 直到 6 年前

rahlf23 6 年前

你就快到了,你只要申请就行了 sort_index() :

df.set_index(['city','rank'], drop=False).sort_index()

产量:

               city rank    name  score1  score2
city   rank                                     
Delhi  1st    Delhi  1st  Ramesh      10      20
       1st    Delhi  1st  Ramesh      10      20
Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
       2nd   Kanpur  2nd  Kirpal      15      35
Mumbai 1st   Mumbai  1st   Jungi      20      40
       1st   Mumbai  1st   Jungi      20      40
Pune   2nd     Pune  2nd   Sanju      25      45
       2nd     Pune  2nd   Sanju      25      45

要删除重复行,请添加 drop_duplicates() :

df.set_index(['city','rank'], drop=False).sort_index().drop_duplicates()

产量:

               city rank    name  score1  score2
city   rank                                     
Delhi  1st    Delhi  1st  Ramesh      10      20
Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
Mumbai 1st   Mumbai  1st   Jungi      20      40
Pune   2nd     Pune  2nd   Sanju      25      45

推荐文章

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

4 月前

Cam · Pandas列表日期到日期时间

4 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

5 月前

jjkennedy · Pandas文本文件导入:当每个文件中存在多个表时,自动选择1个表

5 月前

LMC · Numpy数组布尔索引以获取包含元素

5 月前

vr8ce · 非成对标记中特定字符的正则表达式

6 月前

Kernel · 如果指定了crs参数,shapefile的geopandas.read_file将出错

6 月前

ShaAnder · 为什么sqllachemy返回的是类而不是字符串

6 月前

sixtytrees · detectron2软件包未安装(没有名为“torch”的模块),但我安装了torch

6 月前

Pernoctador · Python映射可以复制吗?我需要参考地图

6 月前