假设实际数据中有行的数据在结束日期,则可以使用逐组转换,添加每个名称组的第一个和最后一个日期的列,并使用这些列进行筛选。
以下示例修改了数据,以便Jim和Sara在2月底获得数据:
df = pd.DataFrame({'names':['jim','jim','jim','jim','jim','jim','jim','jim','jim',
'bob','bob','bob','bob','bob','bob',
'sara','sara','sara','sara','sara','sara','sara','sara','sara','sara', 'jim', 'sara'],
'dates':['01-01-19','01-02-19','01-03-19','01-05-19','01-06-19','01-07-19','01-08-19','01-09-19','01-10-19',
'01-05-19','01-06-19','01-07-19','01-08-19','01-09-19','01-10-19',
'01-01-19','01-02-19','01-03-19','01-04-19','01-05-19','01-06-19','01-07-19','01-08-19','01-09-19','01-10-19', '02-28-19', '02-28-19']})
df['dates'] = pd.to_datetime(df['dates'], format = '%m-%d-%y')
df['first_date'] = df.groupby('names')[['dates']].transform('min')
df['last_date'] = df.groupby('names')[['dates']].transform('max')
start_date = '01-01-19'
end_date = '02-28-19'
df2 = df.loc[(df['first_date'] <= start_date) & (df['last_date'] >= end_date)]