给出一个dict列表:
data = [{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', 'Wednesday': '8:0-18:30', 'Thursday': '8:0-18:30', 'Friday': '8:0-18:30', 'Saturday': '8:0-14:0'},
{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', 'Wednesday': '8:0-22:0', 'Thursday': '8:0-22:0', 'Friday': '8:0-23:0', 'Saturday': '8:0-23:0', 'Sunday': '8:0-22:0'},
{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', 'Wednesday': '7:0-20:0', 'Thursday': '7:0-20:0', 'Friday': '7:0-21:0', 'Saturday': '7:0-21:0', 'Sunday': '7:0-21:0'},
{'Wednesday': '14:0-22:0', 'Thursday': '16:0-22:0', 'Friday': '12:0-22:0', 'Saturday': '12:0-22:0', 'Sunday': '12:0-18:0'},
{'Monday': '0:0-0:0', 'Tuesday': '6:0-22:0', 'Wednesday': '6:0-22:0', 'Thursday': '6:0-22:0', 'Friday': '9:0-0:0', 'Saturday': '9:0-22:0', 'Sunday': '8:0-22:0'},
{'Monday': '0:0-0:0', 'Tuesday': '10:0-18:0', 'Wednesday': '10:0-18:0', 'Thursday': '10:0-18:0', 'Friday': '10:0-18:0', 'Saturday': '10:0-18:0', 'Sunday': '12:0-18:0'},
{'Monday': '9:0-17:0', 'Tuesday': '9:0-17:0', 'Wednesday': '9:0-17:0', 'Thursday': '9:0-17:0', 'Friday': '9:0-17:0'},
None,
{'Monday': '0:0-0:0', 'Tuesday': '6:0-21:0', 'Wednesday': '6:0-21:0', 'Thursday': '6:0-16:0', 'Friday': '6:0-16:0', 'Saturday': '6:0-17:0', 'Sunday': '6:0-21:0'}]
正在执行:
df = pd.json_normalize(data)
print(df)
输出
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 0:0-0:0 8:0-18:30 8:0-18:30 8:0-18:30 8:0-18:30 8:0-14:0 NaN
1 8:0-22:0 8:0-22:0 8:0-22:0 8:0-22:0 8:0-23:0 8:0-23:0 8:0-22:0
2 7:0-20:0 7:0-20:0 7:0-20:0 7:0-20:0 7:0-21:0 7:0-21:0 7:0-21:0
3 NaN NaN 14:0-22:0 16:0-22:0 12:0-22:0 12:0-22:0 12:0-18:0
4 0:0-0:0 6:0-22:0 6:0-22:0 6:0-22:0 9:0-0:0 9:0-22:0 8:0-22:0
5 0:0-0:0 10:0-18:0 10:0-18:0 10:0-18:0 10:0-18:0 10:0-18:0 12:0-18:0
6 9:0-17:0 9:0-17:0 9:0-17:0 9:0-17:0 9:0-17:0 NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN
8 0:0-0:0 6:0-21:0 6:0-21:0 6:0-16:0 6:0-16:0 6:0-17:0 6:0-21:0
正在删除NaN行,并重置索引:
df = df.dropna(how='all').reset_index(drop=True)
print(df)
输出
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 0:0-0:0 8:0-18:30 8:0-18:30 8:0-18:30 8:0-18:30 8:0-14:0 NaN
1 8:0-22:0 8:0-22:0 8:0-22:0 8:0-22:0 8:0-23:0 8:0-23:0 8:0-22:0
2 7:0-20:0 7:0-20:0 7:0-20:0 7:0-20:0 7:0-21:0 7:0-21:0 7:0-21:0
3 NaN NaN 14:0-22:0 16:0-22:0 12:0-22:0 12:0-22:0 12:0-18:0
4 0:0-0:0 6:0-22:0 6:0-22:0 6:0-22:0 9:0-0:0 9:0-22:0 8:0-22:0
5 0:0-0:0 10:0-18:0 10:0-18:0 10:0-18:0 10:0-18:0 10:0-18:0 12:0-18:0
6 9:0-17:0 9:0-17:0 9:0-17:0 9:0-17:0 9:0-17:0 NaN NaN
7 0:0-0:0 6:0-21:0 6:0-21:0 6:0-16:0 6:0-16:0 6:0-17:0 6:0-21:0
正确执行您的功能:
def myFunc(v):
if isinstance(v, str):
start, end = v.split('-')
hhs, mms = (int(v) for v in start.split(':'))
hhe, mme = (int(v) for v in end.split(':'))
return (hhe * 60 + mme) - (hhs * 60 + mms)
else:
return v
df = df.applymap(myFunc)
输出
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 0.0 630.0 630.0 630.0 630.0 360.0 NaN
1 840.0 840.0 840.0 840.0 900.0 900.0 840.0
2 780.0 780.0 780.0 780.0 840.0 840.0 840.0
3 NaN NaN 480.0 360.0 600.0 600.0 360.0
4 0.0 960.0 960.0 960.0 -540.0 780.0 840.0
5 0.0 480.0 480.0 480.0 480.0 480.0 360.0
6 480.0 480.0 480.0 480.0 480.0 NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN
8 0.0 900.0 900.0 600.0 600.0 660.0 900.0
总额:
>>> df.sum().sum()
30600.0
矢量化(ish),并使用timedelta,而不是重新发明轮子:
for col in df:
x = df[col].str.extract('(\d*):(\d*)-(\d*):(\d*)').astype(float)
h = x.loc[:, ::2].apply(pd.to_timedelta, unit='h')
m = x.loc[:, 1::2].apply(pd.to_timedelta, unit='m')
x = h + m.values
df[col] = x[2] - x[0]
>>> df.sum().sum()
Timedelta('21 days 06:00:00')
>>> df.sum().sum().total_seconds() // 60
30600.0
>>> df
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
0 0 days 00:00:00 0 days 10:30:00 0 days 10:30:00 0 days 10:30:00 0 days 10:30:00 0 days 06:00:00 NaT
1 0 days 14:00:00 0 days 14:00:00 0 days 14:00:00 0 days 14:00:00 0 days 15:00:00 0 days 15:00:00 0 days 14:00:00
2 0 days 13:00:00 0 days 13:00:00 0 days 13:00:00 0 days 13:00:00 0 days 14:00:00 0 days 14:00:00 0 days 14:00:00
3 NaT NaT 0 days 08:00:00 0 days 06:00:00 0 days 10:00:00 0 days 10:00:00 0 days 06:00:00
4 0 days 00:00:00 0 days 16:00:00 0 days 16:00:00 0 days 16:00:00 -1 days +15:00:00 0 days 13:00:00 0 days 14:00:00
5 0 days 00:00:00 0 days 08:00:00 0 days 08:00:00 0 days 08:00:00 0 days 08:00:00 0 days 08:00:00 0 days 06:00:00
6 0 days 08:00:00 0 days 08:00:00 0 days 08:00:00 0 days 08:00:00 0 days 08:00:00 NaT NaT
7 0 days 00:00:00 0 days 15:00:00 0 days 15:00:00 0 days 10:00:00 0 days 10:00:00 0 days 11:00:00 0 days 15:00:00