代码之家  ›  专栏  ›  技术社区  ›  SkyWalker

除月底外,最惯用的将所有值设置为NaN的方法是什么?

  •  0
  • SkyWalker  · 技术社区  · 4 年前

    我想学习最惯用的方法将数据帧的所有值设置为 NaN

    # get the values corresponding to the last business day of each month
    df_eofm = df.resample('BM').last()
    # fill the original data frame with NaN's
    df[:] = np.nan
    # now try to set the last business days to the values we saved
    df.update(df_eofm)
    print(df)
    print(df.dropna())
    

                            Col1                    Col2            Col3
    Date                                                                
    1963-12-31              57.5                     -28            0.89
    1964-01-01               NaN                     NaN             NaN
    1964-01-02               NaN                     NaN             NaN
    1964-01-03               NaN                     NaN             NaN
    1964-01-04               NaN                     NaN             NaN
    ...                      ...                     ...             ...
    2020-03-11               NaN                     NaN             NaN
    2020-03-12               NaN                     NaN             NaN
    2020-03-13               NaN                     NaN             NaN
    2020-03-14               NaN                     NaN             NaN
    2020-03-15               NaN                     NaN             NaN
    
    [20530 rows x 3 columns]
                            Col1                    Col2            Col3
    Date                                                                
    1963-12-31              57.5                     -28            0.89
    1964-01-31                54                     106            0.65
    1964-02-28              57.1                     126            0.68
    1964-03-31              57.9                     266            0.73
    1964-04-30              60.2                     144            0.72
    ...                      ...                     ...             ...
    2019-10-31              47.8                     136            0.11
    2019-11-29              48.3                     128            0.22
    2019-12-31              48.1                     266            0.37
    2020-01-31              47.2                     145           -0.08
    2020-02-28              50.9                     225           -0.45
    
    [675 rows x 3 columns]
    
    1 回复  |  直到 4 年前
        1
  •  2
  •   yatu Sayali Sonawane    4 年前

    你可以用 is_month_end 并用生成的布尔序列索引数据帧:

    df[~df.index.is_month_end] = np.nan
    

    this answer 我们可以这样做:

    def is_business_day(date):
        return bool(len(pd.bdate_range(date, date)))
    
    last_bus = (df.index.to_frame()[0]    
                  .map(is_business_day)
                  .groupby(df.index.month)
                  .transform(lambda x: x.last_valid_index()))
    df[df.index==last_bus] = np.nan