代码之家  ›  专栏  ›  技术社区  ›  Sean McCarthy

在“日期时间”列中标记夏令时(DST)小时数

  •  1
  • Sean McCarthy  · 技术社区  · 6 年前

    我创建了一个小时日期数据框,现在我想创建一个列来标记每一行(小时)是否在夏令时。例如,在夏季,标志应==1,在冬季,标志应==0。

    # Localized dates dataframe
    dates = pd.DataFrame(data=pd.date_range('2018-1-1', '2019-1-1', freq='h', tz='America/Denver'), columns=['date_time'])
    
    # My failed attempt to create the flag column
    dates['dst_flag'] = np.where(dates['date_time'].dt.daylight_saving_time == True, 1, 0)
    
    2 回复  |  直到 6 年前
        1
  •  3
  •   Matt Messersmith    6 年前

    import pandas as pd
    import numpy as np
    from pytz import timezone
    
    # Generate data (as opposed to index)                                                                                                                                                                                  
    date_range = pd.to_datetime(pd.date_range('1/1/2018', '1/1/2019', freq='h', tz='America/Denver'))
    date_range = [date for date in date_range]
    
    # Localized dates dataframe                                                                                                                                                           
    df = pd.DataFrame(data=date_range, columns=['date_time'])
    
    # Map transition times to year for some efficiency gain                                                                                                                                                     
    tz = timezone('America/Denver')
    transition_times = tz._utc_transition_times[1:]
    transition_times = [t.astimezone(tz) for t in transition_times]
    transition_times_by_year = {}
    for start_time, stop_time in zip(transition_times[::2], transition_times[1::2]):
        year = start_time.year
        transition_times_by_year[year] = [start_time, stop_time]
    
    # If the date is in DST, mark true, else false                                                                                                                                                              
    def mark_dst(dates):
        for date in dates:
            start_dst, stop_dst = transition_times_by_year[date.year]
            yield start_dst <= date <= stop_dst
    df['dst_flag'] = [dst_flag for dst_flag in mark_dst(df['date_time'])]
    
    # Do a quick sanity check to make sure we did this correctly for year 2018                                                                                                                                  
    dst_start = df[df['dst_flag'] == True]['date_time'][0] # First dst time 2018
    dst_end = df[df['dst_flag'] == True]['date_time'][-1] # Last dst time 2018
    print(dst_start)
    print(dst_end)
    

    这将输出:

    2018-03-11 07:00:00-06:00
    2018-11-04 06:00:00-07:00
    

    哪个是

    一些问题:

    1. pd.date_range 生成 ,而不是数据。我稍微修改了原始代码,使其成为数据而不是索引。我想你已经有数据了。

    2. 怎么做有点傻 tz._utc_transition_times 是结构化的。现在是开始/停止utc DST转换时间,但在早期有一些愚蠢的东西。不过,从1965年开始应该是不错的。如果你是做日期早于那改变 tz._utc_transition_times[1:] 转换时间 . 注:并非1965年之前的所有年份都存在。

    3. 是“Python private”。它可能会在没有警告或通知的情况下更改,并且可能适用于或可能不适用于未来或过去版本的 pytz . 我在用 皮茨

    祝你的研究/回归问题好运!

        2
  •  1
  •   Nick Klavohn    4 年前

    其基本思想是找出时区中当前时间与UTC时间之间的差异。在冬季,时差将比UTC多出一个小时。不管区别是什么,为标志添加1或0所需的内容。

    import pandas as pd
    
    start = pd.to_datetime('2020-10-30')
    end = pd.to_datetime('2020-11-02')
    dates = pd.date_range(start=start, end=end, freq='h', tz='America/Denver')
    df1 = pd.DataFrame({'dst_flag': 1, 'date1': dates.tz_localize(None)}, index=dates)
    
    # add extra day on each end so that there are no nan's after the join    
    dates = pd.to_datetime(pd.date_range(start=start - pd.to_timedelta(1, 'd'), end=end + pd.to_timedelta(1, 'd'), freq='h'), utc=True)
    df2 = pd.DataFrame({'date2': dates.tz_localize(None)}, index=dates)
        
    out = df1.join(df2)
    out['dst_flag'] = (out['date1'] - out['date2']) / pd.to_timedelta(1, unit='h') + 7
    out.drop(columns=['date1', 'date2'], inplace=True)
    
        3
  •  0
  •   Sean McCarthy    6 年前

    以下是我最终所做的事情,它符合我的目的:

    import pandas as pd
    import pytz
    
    # Create dates table and flag Daylight Saving Time dates
    dates = pd.DataFrame(data=pd.date_range('2018-1-1', '2018-12-31-23', freq='h'), columns=['date_time'])
    
    # Create a list of start and end dates for DST in each year, in UTC time
    dst_changes_utc = pytz.timezone('America/Denver')._utc_transition_times[1:]
    
    # Convert to local times from UTC times and then remove timezone information
    dst_changes = [pd.Timestamp(i).tz_localize('UTC').tz_convert('America/Denver').tz_localize(None) for i in dst_changes_utc]
    
    flag_list = []
    for index, row in dates['date_time'].iteritems():
        # Isolate the start and end dates for DST in each year
        dst_dates_in_year = [date for date in dst_changes if date.year == row.year]
        spring = dst_dates_in_year[0]
        fall = dst_dates_in_year[1]
        if (row >= spring) & (row < fall):
            flag = 1
        else:
            flag = 0
        flag_list.append(flag)
    print(flag_list)
    dates['dst_flag'] = flag_list
    del(flag_list)
    
        4
  •  0
  •   Victor L    4 年前

    下面的矢量化方法似乎可以很好地工作。 背后的想法和尼克·克拉沃特的想法是一样的:找出你所在时区的当前时间和utc时间之间的差异。

    # Localized dates dataframe
    df = pd.DataFrame(data=pd.date_range('2018-1-1', '2019-1-1', freq='h', tz='America/Denver'), columns=['date_time'])
    
    df['utc_offset'] = df['date_time'].dt.strftime('%z').str[0:3].astype(float)
    df['utc_offset_shifted'] = df['utc_offset'].shift(-1)
    df['dst'] = df['utc_offset'] - df['utc_offset_shifted']
    df_dst = df[(df['dst'] != 0) & (df['dst'])]
    df_dst = df_dst.drop(['utc_offset', 'utc_offset_shifted'], axis=1).reset_index(drop=True)
    
    print(df_dst)
    

                      date_time  dst
    0 2018-03-11 01:00:00-07:00 -1.0
    1 2018-11-04 01:00:00-06:00  1.0