代码之家  ›  专栏  ›  技术社区  ›  Mark K

从数据框a中的特定行开始,计算数据框b中过去一年的事件

  •  1
  • Mark K  · 技术社区  · 3 年前

    我有两个数据框如下,我想返回多少成功(是)在一年(对于一个特定的人)前一年,他/她的具体日期,即每个条目在 to check 定义范围的步骤 history .

    例如,在 to_check ,Mike 20200602,我想知道Mike有多少成功(是的)(1年前,直到20200602)。

    enter image description here

    通过使用“tou check”作为列表,我想出了一个笨拙的方法:

    import pandas as pd
    import datetime
    import numpy as np
    from io import StringIO
    import time
    from datetime import datetime, date, time, timedelta
    
    
    csvfile = StringIO("""
    Name Check
    Mike 20200602
    David 20210415
    Kate 20201109""")
    
    csvfile_1 = StringIO("""
    Name History Success
    David 20180312 Yes
    David 20180811 Yes
    David 20191223 Yes
    David 20210311 Yes
    Kate 20180906 Yes
    Kate 20180912 Yes
    Kate 20191204 Yes
    Kate 20200505 Yes
    Mike 20180912 Yes
    Mike 20190312 Yes
    Mike 20190806 Yes
    Mike 20191204 Yes""")
    
    
    df_check = pd.read_csv(csvfile, sep = ' ', engine='python')
    df_history = pd.read_csv(csvfile_1, sep = ' ', engine='python')
    
    df_history['Date'] = pd.to_datetime(df_history['History'], format='%Y%m%d')
    
    to_check = ["Mike 20200602","David 20210415","Kate 20201109"]
    
    for t in to_check:
        name, d = t.split(" ")
        date_obj = datetime.strptime(d, '%Y%m%d')
        delta = timedelta(days = 365)
        day_before = date_obj - delta
        m1 = df_history['Name'] == name
        m2 = df_history['Date'] >= day_before
    
        df_history['OP'] = np.where(m1 & m2, "fit", '')
    
        how_many = df_history['OP'].value_counts().tolist()[1]
    
        print (t, how_many)
    

    输出:

    Mike 20200602 2
    David 20210415 1
    Kate 20201109 2
    

    1 回复  |  直到 3 年前
        1
  •  3
  •   Quang Hoang    3 年前

    merge query ,但我建议将日期保留为数字以便于抵消:

    # both `Check` and `History` are numbers, not dates
    (df_check.merge(df_history, on='Name', how='left')
        .query('History<=Check<History+10000')
        .groupby('Name').agg({'History':'first', 'Success':'size'})
    )
    

    输出:

            History  Success
    Name                    
    David  20210311        1
    Kate   20191204        2
    Mike   20190806        2