代码之家  ›  专栏  ›  技术社区  ›  Vector JX

从具有条件的数据帧查找值

  •  0
  • Vector JX  · 技术社区  · 7 年前

    我有两个数据帧,如下所述:

    DF_1>
        Sr.No.  Stage           Time            Result          Result_2
        1       updated_date    1516868822411   1516868822361   1516868822350
        2       id              1516868822411   ABC             -
        3       engine_date     1516868822411   1516868822000   -
        4       blocked         1516868822411   80000           0
        5       updated_date    1516868822398   1516868822350   1516866877815
        6       list            1516868822398   BCD             -
        7       sub_stat_1      1516868779095   AC-12           AC-14
        8       status_1        1516868642468   AC-25           AC-38
    
    DF_2>
    
    Sr. No.     ID        Type_1 Type_2
    1           AC-12      X      Y
    2           AC-14      XX     YY
    3           AC-25      A      B
    4           AC-38      CC     CD
    

    现在,我想用下面提到的条件从DF\u 2得到vlookup值:

    1. 如果阶段为sub\u stat\u 1,则vlookup Result和Type\u 2的Result\u 2(来自DF\u 2)。
    2. 如果阶段为status\u 1,则vlookup Result和Type\u 1的Result\u 2(来自DF\u 2)。
    3. 如果stage为status\u 1或sub\u stat\u 1,但Result或Result\u 2为nothing,则在输出数据帧中给出“-”值。
    4. 将与DF\u 1 Result和Result\u 2相同的其他值分别保留到所需的输出列Final\u 1和Final\u 2。
    5. 只要时间中有历元时间,Result和Result\u 2列(如果可能)将其分别转换为所需输出列time\u 2、Final\u 1和Final\u 2中的正常时间。

    所需输出数据帧:

    Sr. No.    Stage   Time            Result          Result_2      Time_2                     Final_1                     Final_2
    1      updated_date 1516868822411  1516868822361   1516868822350 25/01/2018 08:27:02        25/01/2018 08:27:02         25/01/2018 08:27:02
    2      id           1516868822411  ABC             -             25/01/2018 08:27:02        ABC                         -
    3      engine_date  1516868822411  1516868822000   -             25/01/2018 08:27:02        25/01/2018 08:27:02         -
    4      blocked      1516868822411  80000           0             25/01/2018 08:27:02        80000                       0
    5      updated_date 1516868822398  1516868822350   1516866877815 25/01/2018 08:27:02        25/01/2018 08:27:02         25/01/2018 07:54:38
    6      list         1516868822398  BCD             -             25/01/2018 08:27:02        BCD                         -
    7      sub_stat_1   1516868779095  AC-12           AC-14         25/01/2018 08:26:19        Y (Output of AC-12)         YY (Output of AC-14)
    8      status_1     1516868642468  AC-25           AC-38         25/01/2018 08:24:02        A (Output of AC-25)         CC (Output of AC-38)
    
    1 回复  |  直到 7 年前
        1
  •  1
  •   Gregor Thomas    7 年前

    我假设您的数据框列是打印的,字符串列是 character 类型,非 factor . 将其转换为 性格 如果他们还没有。(请参见底部的示例数据。)

    # first we will create empty FINAL columns
    DF_1$Final_1 = NA
    DF_1$Final_2 = NA
    

    如果阶段为sub\u stat\u 1,则vlookup Result和Result\u 2来自类型\u 2(来自DF\u 2)

    cond = DF_1[, 'Stage'] == "sub_stat_1"
    from = "Type_2"
    DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
    DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]
    

    如果阶段为status\u 1,则vlookup Result和Type\u 1的Result\u 2(来自DF\u 2)。

    cond = DF_1[, 'Stage'] == "status_1"
    from = "Type_1"
    DF_1[cond, "Final_1"] = DF_2[match(DF_1[cond, "Result"], DF_2[, 'ID']), from]
    DF_1[cond, "Final_2"] = DF_2[match(DF_1[cond, "Result_2"], DF_2[, 'ID']), from]
    

    如果stage为status\u 1或sub\u stat\u 1,但Result或Result\u 2为nothing,则在输出数据帧中给出“-”值。

    我用缺失的值初始化, NA . 我鼓励你离开他们,但如果你真的想,你可以 DF_1[is.na(DF_1)] = "-" .

    将与DF\u 1 Result和Result\u 2相同的其他值分别保留到所需的输出列Final\u 1和Final\u 2。

    cond = ! DF_1[, 'Stage'] %in% c("status_1", "sub_stat_1")
    DF_1[cond, "Final_1"] = DF_1[cond, "Result"]
    DF_1[cond, "Final_2"] = DF_1[cond, "Result_2"]
    

    只要时间中有历元时间,Result和Result\u 2列(如果可能)将其分别转换为所需输出列time\u 2、Final\u 1和Final\u 2中的正常时间。

    我把这个留给你-你可以用 as.POSIXct() 在你的大纪元时代,如果你提供原点,但你的整数在我看来太大了。你可能想 format 在将它们插入最后一列之前,您可以控制它们转换为字符时的外观。如果你在这方面需要更多的帮助,可以问一个单独的问题。

    DF_1
    #   Sr.No.        Stage         Time        Result      Result_2       Final_1       Final_2
    # 1      1 updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
    # 2      2           id 1.516869e+12           ABC             -           ABC             -
    # 3      3  engine_date 1.516869e+12 1516868822000             - 1516868822000             -
    # 4      4      blocked 1.516869e+12         80000             0         80000             0
    # 5      5 updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
    # 6      6         list 1.516869e+12           BCD             -           BCD             -
    # 7      7   sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
    # 8      8     status_1 1.516869e+12         AC-25         AC-38             A            CC
    

    使用此数据:

    DF_1
    #   Sr.No.  Stage         Time        Result      Result_2       Final_1       Final_2
    # 1  updated_date 1.516869e+12 1516868822361 1516868822350 1516868822361 1516868822350
    # 2            id 1.516869e+12           ABC             -           ABC             -
    # 3   engine_date 1.516869e+12 1516868822000             - 1516868822000             -
    # 4       blocked 1.516869e+12         80000             0         80000             0
    # 5  updated_date 1.516869e+12 1516868822350 1516866877815 1516868822350 1516866877815
    # 6          list 1.516869e+12           BCD             -           BCD             -
    # 7    sub_stat_1 1.516869e+12         AC-12         AC-14             Y            YY
    # 8      status_1 1.516869e+12         AC-25         AC-38             A            CC
    

    使用此数据:

    DF_1 = read.table(text = "Sr.No.  Stage       Time            Result      
    
        Result_2
        1   updated_date    1516868822411   1516868822361   1516868822350
        2   id              1516868822411   ABC             -
        3   engine_date     1516868822411   1516868822000   -
        4   blocked         1516868822411   80000           0
        5   updated_date    1516868822398   1516868822350   1516866877815
        6   list            1516868822398   BCD             -
        7   sub_stat_1      1516868779095   AC-12           AC-14
        8   status_1        1516868642468   AC-25           AC-38", check.names = F, stringsAsFactors = FALSE, header = T)
    
    DF_2 = read.table(text = "Sr.No. ID     Type_1 Type_2
    1   AC-12      X      Y
    2   AC-14      XX     YY
    3   AC-25      A      B
    4   AC-38      CC     CD", check.names = F, stringsAsFactors = FALSE, header = T)