代码之家  ›  专栏  ›  技术社区  ›  Karn Kumar

分层索引按列排序,同时保留原始列

  •  0
  • Karn Kumar  · 技术社区  · 6 年前

    我正试图模拟一个Hirarical索引数据帧,如下所示:

    >>> raw_data = ({'city': ['Delhi', 'Kanpur', 'Mumbai', 'Pune','Delhi', 'Kanpur', 'Mumbai', 'Pune'],
    ...                 'rank': ['1st', '2nd', '1st', '2nd','1st', '2nd', '1st', '2nd'],
    ...                 'name': ['Ramesh', 'Kirpal', 'Jungi', 'Sanju','Ramesh', 'Kirpal', 'Jungi', 'Sanju'],
    ...                 'score1': [10,15,20,25,10,15,20,25],
    ...                 'score2': [20,35,40,45,20,35,40,45]})
    

    下面是DataFrame的样子,所以datFrame附带了默认索引。

    >>> df = pd.DataFrame(raw_data, columns = ['city', 'rank', 'name', 'score1', 'score2'])
    >>> df
         city rank    name  score1  score2
    0   Delhi  1st  Ramesh      10      20
    1  Kanpur  2nd  Kirpal      15      35
    2  Mumbai  1st   Jungi      20      40
    3    Pune  2nd   Sanju      25      45
    4   Delhi  1st  Ramesh      10      20
    5  Kanpur  2nd  Kirpal      15      35
    6  Mumbai  1st   Jungi      20      40
    7    Pune  2nd   Sanju      25      45
    

    我想通过选择 'city', 'rank' set.index 方法,同时保留原始列。

    >>> df.set_index(['city', 'rank'], drop=False)
                   city rank    name  score1  score2
    city   rank
    Delhi  1st    Delhi  1st  Ramesh      10      20
    Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
    Mumbai 1st   Mumbai  1st   Jungi      20      40
    Pune   2nd     Pune  2nd   Sanju      25      45
    Delhi  1st    Delhi  1st  Ramesh      10      20
    Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
    Mumbai 1st   Mumbai  1st   Jungi      20      40
    Pune   2nd     Pune  2nd   Sanju      25      45
    

    但我希望有索引 city 首先,然后按索引 rank

                   city rank    name  score1  score2
    city   rank
    Delhi  1st    Delhi  1st  Ramesh      10      20
           1st    Delhi  1st  Ramesh      10      20
    
    Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
           2nd   Kanpur  2nd  Kirpal      15      35
    
    Mumbai 1st   Mumbai  1st   Jungi      20      40
           1st   Mumbai  1st   Jungi      20      40
    
    Pune   2nd     Pune  2nd   Sanju      25      45
           2nd     Pune  2nd   Sanju      25      45
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   rahlf23    6 年前

    你就快到了,你只要申请就行了 sort_index() :

    df.set_index(['city','rank'], drop=False).sort_index()
    

    产量:

                   city rank    name  score1  score2
    city   rank                                     
    Delhi  1st    Delhi  1st  Ramesh      10      20
           1st    Delhi  1st  Ramesh      10      20
    Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
           2nd   Kanpur  2nd  Kirpal      15      35
    Mumbai 1st   Mumbai  1st   Jungi      20      40
           1st   Mumbai  1st   Jungi      20      40
    Pune   2nd     Pune  2nd   Sanju      25      45
           2nd     Pune  2nd   Sanju      25      45
    

    要删除重复行,请添加 drop_duplicates() :

    df.set_index(['city','rank'], drop=False).sort_index().drop_duplicates()
    

    产量:

                   city rank    name  score1  score2
    city   rank                                     
    Delhi  1st    Delhi  1st  Ramesh      10      20
    Kanpur 2nd   Kanpur  2nd  Kirpal      15      35
    Mumbai 1st   Mumbai  1st   Jungi      20      40
    Pune   2nd     Pune  2nd   Sanju      25      45