代码之家  ›  专栏  ›  技术社区  ›  Kang

Python时间序列:将字典中的每日数据合并为每周数据

  •  0
  • Kang  · 技术社区  · 7 年前

    我有一本字典如下。

    my_dict.keys() = 
    dict_keys([20160101, 20160102, 20160103, 20160104, 20160105, 20160106,
           20160107, 20160108, 20160109, 20160110, 20160111, 20160112,
           20160113, 20160114, 20160115, 20160116, 20160117, 20160118,
           20160119, 20160120, 20160121, 20160122, 20160123, 20160124,
           ......    
           20171203, 20171204, 20171213, 20171215, 20171216, 20171217,
           20171218, 20171219, 20171220, 20171221, 20171222, 20171223,
           20171224, 20171225, 20171226, 20171227, 20171228, 20171229,
           20171230, 20171231])
    
    my_dict[20160101] = 
    array([[ 0.,  0.,  1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  2.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  0.,  0.],
           [ 1.,  0.,  0.,  0.,  0.,  2.],
           [ 0.,  0.,  4.,  0.,  0.,  0.]])
    

    所以,正如您已经注意到的,我的键指示日期,每个日期都有一个6乘6浮点的数组。在my\u dict中的每个键中,所有索引都是相同的。

    **需要注意的重要一点是,my\u dict并不是每天都有。例如,在20171204之后,其20171213和20171215。所以可以跳过日期。

    现在,我的任务是将每日数据(而不是每一天)添加到每周数据中,并在一周内添加所有值。换句话说,从2016年第一周到2017年最后一周,将一周内的每个值相加,并提供每周数据。此外,由于2016年第一周从20160103(Sun)开始,我可以忽略我的dict中的20160101和20160102数据以及2017年最后一周的数据。你们能帮我解决这个问题吗?提前感谢!

    -------编辑--------- 我的问题似乎不够清楚。因此,我将提供一个快速的示例。因为我想遵循熊猫数据时间周的标准,所以每周从周日开始。因此,2016年的第一周将是201601032016010420160105201601062016010720160108201601–09。

    因此,我的新词典《每周词典》[201601]<-其中201601表示2016年的第一周,将添加键20160103201601042016010520160106201601072016010820160109中的所有值,并作为值输入。

    weekly_dict = {}
    weekly_dict[201601] = my_dict[20160103] + my_dict[20160104] + my_dict[20160105] + my_dict[20160106] + my_dict[20160107] + my_dict[20160108] + my_dict[20160109]
    

    并继续。希望这有意义。谢谢

    2 回复  |  直到 7 年前
        1
  •  1
  •   PMende    7 年前

    这可能是熊猫的工作:

    import pandas as pd
    
    # First, get a list of keys
    date_ints = list(my_dict)
    # Turn them into a pandas Series object
    date_int_series = pd.Series(date_ints)
    # Cast them to a string, then format them into a full datetime-type with the proper
    # format specification
    datetime_series = pd.to_datetime(date_int_series.astype('str'), format='%Y%m%d')
    # Create a dictionary mapping each date integer -> week of the year
    date_int_to_week = dict(zip(date_int_series, datetime_series.dt.week))
    

    这本字典有 my_dict 作为键,并将其对应的一年中的一周作为其值。

    编辑:

    如果您要查找的是基于周对原始词典中的每个条目求和,您可以执行以下操作:

    week_to_date_list = {}
    for date_int, week in date_int_to_week.items():
        if week not in week_to_date_list:
            week_to_date_list[week] = []
        week_to_date_list[week].append(date_int)
    
    my_dict_weekly = {}
    for week in week_to_date_list:
        arrays_in_week = [my_dict[day_int] for day_int in week_to_date_list[week]]
        my_dict_weekly[week] = reduce(sum, arrays_in_week)
    

    my_dict_weekly 现在应该是一本以一年中的几周为关键字的词典,然后 sum 对应于该周的所有数组。如果您使用的是python 3,则需要导入 reduce 从…起 functools .

        2
  •  1
  •   Chiheb Nexus    7 年前

    如果我能很好地理解你的问题,我想你可以用 datetime timedelta 从…起 日期时间 类似于此示例的模块:

    from datetime import datetime, timedelta
    
    def get_days_of_week(year, week=1):
        # number of the days
        days = {'Monday': 1, 'Tuesday': 2, 'Wednesday': 3, 
                'Thursday': 4, 'Friday': 5, 'Saturday': 6, 'Sunday': 7}
        # construct the datetime object with the year and the desired week
        a = datetime.strptime('{0}'.format(year), '%Y') + timedelta(days=7*(week-1))
        # Every week should start by Sunday .. So escaping days untill the first Sunday
        a += timedelta(days=7-days.get(a.strftime('%A'), 0))
        for k in range(0, 7):
            yield (a + timedelta(days=k)).strftime('%Y%m%d')
    
    days = list(get_days_of_week(2016, week=1))
    print('2016 / week = 1:', days)
    
    days = list(get_days_of_week(2016, week=22))
    print('2016 / week = 22:', days)
    

    输出:

    2016 / week = 1: 
     ['20160103',
     '20160104',
     '20160105',
     '20160106',
     '20160107',
     '20160108',
     '20160109']
    
    2016 / week = 22: 
     ['20160529',
     '20160530',
     '20160531',
     '20160601',
     '20160602',
     '20160603',
     '20160604']
    

    编辑:

    根据您上次的编辑,此代码可能满足您的需要:

    from datetime import datetime, timedelta
    
    def get_days_of_week(data):
        # number of the days
        days = {'Monday': 1, 'Tuesday': 2, 'Wednesday': 3,
                'Thursday': 4, 'Friday': 5, 'Saturday': 6, 'Sunday': 7}
        date = datetime.strptime('{}'.format(data), '%Y%m%d')
        # get week number
        week = int(date.strftime('%U'))
        # get year
        year = date.strftime('%Y')
        # construct the datetime object with the year and the desired week
        a = datetime.strptime(year, '%Y') + timedelta(days=7*week)
        # Every week should start by Synday .. So escaping days untill the first Sunday
        a += timedelta(days=7-days.get(a.strftime('%A'), 0))
    
        return {int(str(data)[:-2]): [int((a + timedelta(days=k)).strftime('%Y%m%d')) for k in range(0, 7)]}
    
    week_dict = {}
    week_dict.update(get_days_of_week(20160101))
    week_dict.update(get_days_of_week(20160623))
    print(week_dict[201601])
    print(week_dict[201606])
    
    print(week_dict)
    

    输出:

    [20160103, 20160104, 20160105, 20160106, 20160107, 20160108, 20160109]
    [20160626, 20160627, 20160628, 20160629, 20160630, 20160701, 20160702]
    { 201601: [ 20160103,
                20160104,
                20160105,
                20160106,
                20160107,
                20160108,
                20160109],
      201606: [ 20160626,
                20160627,
                20160628,
                20160629,
                20160630,
                20160701,
                20160702]}