代码之家  ›  专栏  ›  技术社区  ›  Stacey

尝试使用引用数据帧从数据帧返回记录

  •  1
  • Stacey  · 技术社区  · 6 年前

    DeptTemplate )其中的.head()如下所示:

           Name     Status  Status change date   Product
    0       Bob    CURRENT                 NaN   Pencils
    1     Steve    CURRENT                 NaN      Pens
    2   Heather NEW JOINER          02/08/2018     Paper
    3     Lizzy NEW JOINER          06/02/2018      Pens
    4     Ralph       LEFT                 NaN     Paper
    

    我正在尝试识别并返回具有非“当前”状态且没有状态更改日期的记录的所有信息。

     def checkStatusChangeDate(DeptTemplate,filename,filepath, referencePeriodStartDate, referencePeriodEndDate,writer):
            #This code checks if a status is not current that there is a status change date attached
    
            test = DeptTemplate[DeptTemplate.Status != "CURRENT"]
    
            pd.to_datetime(test['Status change date'])
    
            test['Status change date'].dt.strftime('%d/%m/%Y')   
    
            statusError = test['Status change date'] == 'NaT'
    
            finalError = DeptTemplate.loc[statusError['Status change date']]
    

    我首先确定任何不是“当前”的记录。然后,我从这个子集中识别出任何没有状态更改日期的记录。我的结局是 statusError

    4    False
    

    我遇到的问题是,通过引用 状态错误 原始数据帧 脱模板

    我正在尝试使用:

    finalError = DeptTemplate.loc[statusError['Status change date']]
    

    finalError = DeptTemplate[statusError['Status change date']]
    

    但不能让整个记录在 finalError 数据帧

    终结者 数据帧看起来像:

           Name     Status  Status change date   Product
    4     Ralph       LEFT                 NaN     Paper
    
    2 回复  |  直到 6 年前
        1
  •  1
  •   user3471881    6 年前

    你就快到了,但你正试着切掉你的原作 DataFrame 数据帧

    步骤1:设置布尔掩码

    not_current = df['STATUS'] != 'CURRENT'
    
    no_date_change = df['Status change date'].isnull()
    

    第二步:使用口罩

    df[not_current & no_date_change]
    
        2
  •  0
  •   Lukas Humpe    6 年前

    如果我理解正确:您希望找到一个不包含更改日期和当前状态的记录,然后将所有其他记录返回到。因此,如果有其他条目为拉尔夫举例来说,你想得到他们以及。

    我的解决方案是:

    import pandas as pd 
    
    
    data = {"Name":["Bob","Steve","Heather","Lizzy","Ralph","Ralph","Ralph"],
        "Status":["CURRENT","CURRENT","NEW JOINER","NEW JOINER","LEFT","CURRENT","CURRENT"],
        "Status change date": ["","","02/08/2018","06/02/2018","","06/02/2018","06/02/2018"],
        "Product":["Pencils","Pens","Paper","Pens","Paper","Pencils","Pens"]}
    
    df = pd.DataFrame(data)
    
    df["Status change date"]=pd.to_datetime(df["Status change date"])
    
    df.head()
          Name      Status Status change date  Product
    0      Bob     CURRENT                NaT  Pencils
    1    Steve     CURRENT                NaT     Pens
    2  Heather  NEW JOINER         2018-02-08    Paper
    3    Lizzy  NEW JOINER         2018-06-02     Pens
    4    Ralph        LEFT                NaT    Paper
    5    Ralph     CURRENT         2018-06-02  Pencils
    6    Ralph     CURRENT         2018-06-02     Pens
    

    获取不包含当前状态和状态更改日期的所有条目:

    finalError = df[(df["Status"]!="CURRENT") & (df["Status change date"].isnull())]
    
    finalError.head()
    
        Name Status Status change date Product
    4  Ralph   LEFT                NaT   Paper
    

    df[df["Name"]==finalError["Name"].any()]
    
        Name   Status Status change date  Product
    4  Ralph     LEFT                NaT    Paper
    5  Ralph  CURRENT         2018-06-02  Pencils
    6  Ralph  CURRENT         2018-06-02     Pens