代码之家  ›  专栏  ›  技术社区  ›  bar1

在Pandas中以特定格式添加一列新的timedelta值

  •  2
  • bar1  · 技术社区  · 7 年前

    我有一个数据帧,我想添加一列,其中包含另两列之间的时间差:

     df[Diff] = df['End Time'] - df['Open Time']
     df[Diff]
     0     0 days 01:25:40
     1     0 days 00:41:57
     2     0 days 00:21:47
     3     0 days 16:41:57
     4     0 days 04:32:00
     5     0 days 03:01:57
     6     0 days 01:37:56
     7     0 days 01:13:57
     8     0 days 01:07:56
     9     0 days 02:33:59
     10   29 days 18:33:53
     11    0 days 03:50:56
     12    0 days 01:57:56
    

    我希望此列的格式为“1h 25m”,因此我尝试以小时为单位计算天数:

    diff = df['End Time'] - df['Open Time']
    hours = diff.dt.days * 24 + diff.dt.components.hours
    minutes = diff.dt.components.minutes
    

    并收到以下结果:

    0       1
    1       0
    2       0
    3      16
    4       4
    5       3
    6       1
    7       1
    8       1
    9       2
    10    714
    11      3
    12      1
    dtype: int64h 0     25
    1     41
    2     21
    3     41
    4     32
    5      1
    6     37
    7     13
    8      7
    9     33
    10    33
    11    50
    12    57
    Name: minutes, dtype: int64m
    

     '{}h {}m'.format(hours,minutes)) 
    
    2 回复  |  直到 7 年前
        1
  •  2
  •   cs95 abhishek58g    7 年前

    str 使用 astype ,并根据需要连接cols。

    c = (df['End Time'] - df['Open Time'])\
                  .dt.components[['days', 'hours', 'minutes']]
    df['diff'] = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
    df['diff']
    0       1h 25m
    1       0h 41m
    2       0h 21m
    3      16h 41m
    4       4h 32m
    5        3h 1m
    6       1h 37m
    7       1h 13m
    8        1h 7m
    9       2h 33m
    10    714h 33m
    11      3h 50m
    12      1h 57m
    Name: diff, dtype: object
    
        2
  •  1
  •   jezrael    7 年前

    您可以使用 total_seconds 用于转换 timedelta 到秒,然后计数 hours , minutes dt.components :

    s = diff.dt.total_seconds().astype(int)
    
    hours = s // 3600 
    # remaining seconds
    s = s - (hours * 3600)
    # minutes
    minutes = s // 60
    # remaining seconds
    seconds = s - (minutes * 60)
    
    a = hours.astype(str) + 'h ' + minutes.astype(str)
    print (a)
    0       1h 25
    1       0h 41
    2       0h 21
    3      16h 41
    4       4h 32
    5        3h 1
    6       1h 37
    7       1h 13
    8        1h 7
    9       2h 33
    10    714h 33
    11      3h 50
    12      1h 57
    Name: Diff, dtype: object
    

    Zero comment 解决:

    hours = diff.dt.days * 24 + diff.dt.components.hours
    minutes = diff.dt.components.minutes
    
    a = hours.astype(str) + 'h ' + minutes.astype(str)
    print (a)
    0      1h 25m
    1      0h 41m
    2      0h 21m
    3     16h 41m
    4      4h 32m
    5       3h 1m
    6      1h 37m
    7      1h 13m
    8       1h 7m
    9      2h 33m
    10    18h 33m
    11     3h 50m
    12     1h 57m
    dtype: object
    

    a = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
    print (a)
    0       1h 25m
    1       0h 41m
    2       0h 21m
    3      16h 41m
    4       4h 32m
    5        3h 1m
    6       1h 37m
    7       1h 13m
    8        1h 7m
    9       2h 33m
    10    714h 33m
    11      3h 50m
    12      1h 57m
    dtype: object
    

    :

    #13000 rows
    df = pd.concat([df]*1000).reset_index(drop=True)
    
    In [191]: %%timeit
         ...: hours = diff.dt.days * 24 + diff.dt.components.hours
         ...: minutes = diff.dt.components.minutes
         ...: 
         ...: a = hours.astype(str) + 'h ' + minutes.astype(str)
         ...: 
    1 loop, best of 3: 483 ms per loop
    
    In [192]: %%timeit
         ...: s = diff.dt.total_seconds().astype(int)
         ...: 
         ...: hours = s // 3600 
         ...: # remaining seconds
         ...: s = s - (hours * 3600)
         ...: # minutes
         ...: minutes = s // 60
         ...: # remaining seconds
         ...: seconds = s - (minutes * 60)
         ...: 
         ...: a = hours.astype(str) + 'h ' + minutes.astype(str)
         ...: 
    10 loops, best of 3: 43.9 ms per loop
    
    In [193]: %%timeit
         ...: hours = diff.dt.days * 24 + diff.dt.components.hours
         ...: minutes = diff.dt.components.minutes
         ...: s = pd.Series(['{0[0]}h {0[1]}m'.format(x) for x in zip(hours, minutes)])
         ...: 
    1 loop, best of 3: 465 ms per loop
    

    #cᴏʟᴅsᴘᴇᴇᴅ solution
    In [194]: %%timeit
         ...: c = diff.dt.components[['days', 'hours', 'minutes']]
         ...: a = (c.days * 24 + c.hours).astype(str) + 'h ' + c.minutes.astype(str) + 'm'
         ...: 
    1 loop, best of 3: 208 ms per loop