代码之家  ›  专栏  ›  技术社区  ›  KOB

drop(<index_value>)从df中删除具有该索引的所有行,但在df.index中保留<index_value>。

  •  0
  • KOB  · 技术社区  · 6 年前

    我有一个二级多指标的df,由 ['ID', 'Date'] . df按 ID 然后通过 Date . IDs 范围为1-5。我正试图删除所有与 ID == 1 . 这工作,但 df.index 仍然显示所有 1 价值观。

    print(data.head(1))
    print(data.index)
    
    data.drop(1, inplace=True)
    
    print(data.head(1))
    print(data.index)
    

    输出以下内容:

                        Inc       Exp  Inc_Label  Exp_Label
    ID Date                                                
    1  1993-12-31  0.064379  0.004731   0.083734   0.009975
       1994-12-31  0.067377  0.009975   0.084116   0.015092
       1995-12-31  0.067766  0.015092   0.087881   0.017213
    MultiIndex(levels=[[1, 2, 3, 4, 5], ['1968-12-31', '1969-12-31', '1970-12-31', '1971-12-31', '1972-12-31', '1973-12-31', '1974-12-31', '1975-12-31', '1976-12-31', '1977-12-31', '1978-12-31', '1979-12-31', '1980-12-31', '1981-12-31', '1982-12-31', '1983-12-31', '1984-12-31', '1985-12-31', '1986-12-31', '1987-12-31', '1988-12-31', '1989-12-31', '1990-12-31', '1991-12-31', '1992-12-31', '1993-12-31', '1994-12-31', '1995-12-31', '1996-12-31', '1997-12-31', '1998-12-31', '1999-12-31', '2000-12-31', '2001-12-31', '2002-12-31', '2003-12-31', '2004-12-31', '2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31', '2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31', '2024-12-31', '2025-12-31', '2026-12-31', '2027-12-31', '2028-12-31', '2029-12-31', '2030-12-31', '2031-12-31', '2032-12-31', '2033-12-31', '2034-12-31', '2035-12-31', '2036-12-31', '2037-12-31', '2038-12-31', '2039-12-31', '2040-12-31', '2041-12-31', '2042-12-31', '2043-12-31', '2044-12-31', '2045-12-31', '2046-12-31', '2047-12-31', '2048-12-31', '2049-12-31', '2050-12-31', '2051-12-31', '2052-12-31', '2053-12-31', '2054-12-31', '2055-12-31', '2056-12-31', '2057-12-31', '2058-12-31', '2059-12-31', '2060-12-31', '2061-12-31', '2062-12-31', '2063-12-31', '2064-12-31', '2065-12-31', '2066-12-31', '2067-12-31', '2068-12-31', '2069-12-31', '2070-12-31', '2071-12-31', '2072-12-31', '2073-12-31', '2074-12-31', '2075-12-31', '2076-12-31', '2077-12-31', '2078-12-31', '2079-12-31']],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81]],
           names=['ID', 'Date'])
                        Inc       Exp  Inc_Label  Exp_Label
    ID Date                                                
    2  1973-12-31  0.056571  0.001702   0.073810   0.001010
       1974-12-31  0.057276  0.001010   0.076057   0.000000
       1975-12-31  0.059563  0.000000   0.077986   0.002915
    MultiIndex(levels=[[1, 2, 3, 4, 5], ['1968-12-31', '1969-12-31', '1970-12-31', '1971-12-31', '1972-12-31', '1973-12-31', '1974-12-31', '1975-12-31', '1976-12-31', '1977-12-31', '1978-12-31', '1979-12-31', '1980-12-31', '1981-12-31', '1982-12-31', '1983-12-31', '1984-12-31', '1985-12-31', '1986-12-31', '1987-12-31', '1988-12-31', '1989-12-31', '1990-12-31', '1991-12-31', '1992-12-31', '1993-12-31', '1994-12-31', '1995-12-31', '1996-12-31', '1997-12-31', '1998-12-31', '1999-12-31', '2000-12-31', '2001-12-31', '2002-12-31', '2003-12-31', '2004-12-31', '2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31', '2015-12-31', '2016-12-31', '2017-12-31', '2018-12-31', '2019-12-31', '2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31', '2024-12-31', '2025-12-31', '2026-12-31', '2027-12-31', '2028-12-31', '2029-12-31', '2030-12-31', '2031-12-31', '2032-12-31', '2033-12-31', '2034-12-31', '2035-12-31', '2036-12-31', '2037-12-31', '2038-12-31', '2039-12-31', '2040-12-31', '2041-12-31', '2042-12-31', '2043-12-31', '2044-12-31', '2045-12-31', '2046-12-31', '2047-12-31', '2048-12-31', '2049-12-31', '2050-12-31', '2051-12-31', '2052-12-31', '2053-12-31', '2054-12-31', '2055-12-31', '2056-12-31', '2057-12-31', '2058-12-31', '2059-12-31', '2060-12-31', '2061-12-31', '2062-12-31', '2063-12-31', '2064-12-31', '2065-12-31', '2066-12-31', '2067-12-31', '2068-12-31', '2069-12-31', '2070-12-31', '2071-12-31', '2072-12-31', '2073-12-31', '2074-12-31', '2075-12-31', '2076-12-31', '2077-12-31', '2078-12-31', '2079-12-31']],
           labels
           names=['ID', 'Date'])
    

    稍后,我尝试创建一个dict,其中每个键都是 身份证件 s,每个值都是原始df的对应子df:

    dict = {index: df.loc[index] for index in df.index.levels[0]}
    

    投掷:

    KeyError: 'the label [1] is not in the [index]'
    

    我不明白发生了什么 levels 在多指数下跌后保持不变,但是 labels 是不同的。

    1 回复  |  直到 6 年前
        1
  •  0
  •   ALollz    6 年前

    您需要从索引中删除未使用的级别,并且 pandas 有一种方法可以做到这一点: pandas.MultiIndex.remove_unused_levels

    data.index = data.index.remove_unused_levels()
    

    但是,如果您只想创建一个唯一组的字典,那么您应该使用 groupby :

    dct = dict((id, gp) for id, gp in data.groupby(level=0))
    

    同时避免命名变量 dict 因为您将覆盖默认值 双关语 我在上面使用的功能。


    样本数据

    df1 = pd.DataFrame({'id1': [1,1,1,2,2],
                       'id2': list('ABCAB'),
                       'val': [11,12,13,14,15]})
    df1 = df1.set_index(['id1', 'id2'])
    df1.index
    #MultiIndex(levels=[[1, 2], ['A', 'B', 'C']],
    #           labels=[[0, 0, 0, 1, 1], [0, 1, 2, 0, 1]],
    #           names=['id1', 'id2'])
    
    df2 = df1.drop(1)
    df2.index
    #MultiIndex(levels=[[1, 2], ['A', 'B', 'C']],
    #           labels=[[1, 1], [0, 1]],
    #           names=['id1', 'id2'])
    
    df2.index = df2.index.remove_unused_levels()
    df2.index
    #MultiIndex(levels=[[2], ['A', 'B']],
    #           labels=[[0, 0], [0, 1]],
    #           names=['id1', 'id2'])