我正试图模拟一个Hirarical索引数据帧,如下所示:
>>> raw_data = ({'city': ['Delhi', 'Kanpur', 'Mumbai', 'Pune','Delhi', 'Kanpur', 'Mumbai', 'Pune'],
... 'rank': ['1st', '2nd', '1st', '2nd','1st', '2nd', '1st', '2nd'],
... 'name': ['Ramesh', 'Kirpal', 'Jungi', 'Sanju','Ramesh', 'Kirpal', 'Jungi', 'Sanju'],
... 'score1': [10,15,20,25,10,15,20,25],
... 'score2': [20,35,40,45,20,35,40,45]})
下面是DataFrame的样子,所以datFrame附带了默认索引。
>>> df = pd.DataFrame(raw_data, columns = ['city', 'rank', 'name', 'score1', 'score2'])
>>> df
city rank name score1 score2
0 Delhi 1st Ramesh 10 20
1 Kanpur 2nd Kirpal 15 35
2 Mumbai 1st Jungi 20 40
3 Pune 2nd Sanju 25 45
4 Delhi 1st Ramesh 10 20
5 Kanpur 2nd Kirpal 15 35
6 Mumbai 1st Jungi 20 40
7 Pune 2nd Sanju 25 45
我想通过选择
'city', 'rank'
set.index
方法,同时保留原始列。
>>> df.set_index(['city', 'rank'], drop=False)
city rank name score1 score2
city rank
Delhi 1st Delhi 1st Ramesh 10 20
Kanpur 2nd Kanpur 2nd Kirpal 15 35
Mumbai 1st Mumbai 1st Jungi 20 40
Pune 2nd Pune 2nd Sanju 25 45
Delhi 1st Delhi 1st Ramesh 10 20
Kanpur 2nd Kanpur 2nd Kirpal 15 35
Mumbai 1st Mumbai 1st Jungi 20 40
Pune 2nd Pune 2nd Sanju 25 45
但我希望有索引
city
首先,然后按索引
rank
city rank name score1 score2
city rank
Delhi 1st Delhi 1st Ramesh 10 20
1st Delhi 1st Ramesh 10 20
Kanpur 2nd Kanpur 2nd Kirpal 15 35
2nd Kanpur 2nd Kirpal 15 35
Mumbai 1st Mumbai 1st Jungi 20 40
1st Mumbai 1st Jungi 20 40
Pune 2nd Pune 2nd Sanju 25 45
2nd Pune 2nd Sanju 25 45