代码之家  ›  专栏  ›  技术社区  ›  syebill

缺失值提取

  •  0
  • syebill  · 技术社区  · 6 年前

    我有两个数据集,它们的dput如下:

    场景:DF1包含特定日期和时间的单价数据。DF2包含DF1中每个站点代码的竞争对手,以及价格和竞争对手品牌名称。DF2没有DF1中日期和时间的所有单价。我需要以某种方式将df2合并到df1中,这样我就可以获得日期时间、产品、服务类型和品牌,但竞争对手的单价不适用,这样我就可以在后面输入这些值。

    我想做的是:

    1. 根据基于类似站点代码、产品和服务类型的DF1中的值,找出DF2中缺少的日期时间单价。DF1中的每个站点代码在DF2中都有多个“站点代码”,因为可以有多个竞争对手。

    2. 以某种方式提取和合并这些信息,以便我可以在之后输入值。

    假设DF1是

     date StationCode  ServiceType Product unitprice
    1 2017-06-01 06:00:00        1002 Self-Service      GG     1.345
    2 2017-06-01 07:00:00        1002       Served      GG     1.456
    3 2017-06-01 07:00:00        1002 Self-Service      SP     1.012 
    dput(df1) 
    
    
    structure(list(date = structure(c(1496300400,
         1496304000, 1496304000 ), class = c("POSIXct", "POSIXt"), tzone =
         "Etc/GMT+1"), StationCode = c(1002,  1002, 1002), ServiceType =
         structure(c(1L, 2L, 1L), .Label = c("Self-Service",  "Served"), class
         = "factor"), Product = structure(c(1L, 1L, 2L ), .Label = c("GG", "SP"), class = "factor"), unitprice = c(1.345, 
         1.456, 1.012)), class = "data.frame", row.names = c(NA, -3L))
    

    DF2是

    date compstcode StationCode  ServiceType Product unitprice brand
    1 2017-06-01 06:00:00       3456        1002 Self-Service      GG     1.425 Shell
    2 2017-06-01 06:00:00       1267        1002       Served      SP     1.406    BP
    3 2017-06-01 06:00:00       5488        1002 Self-Service      GG     1.011 Total
    
    dput(df2)
        structure(list(date = structure(c(1496300400, 1496300400, 1496300400
    ), class = c("POSIXct", "POSIXt"), tzone = "Etc/GMT+1"), compstcode = c(3456, 
    1267, 5488), StationCode = c(1002, 1002, 1002), ServiceType = structure(c(1L, 
    2L, 1L), .Label = c("Self-Service", "Served"), class = "factor"), 
        Product = structure(c(1L, 2L, 1L), .Label = c("GG", "SP"), class = "factor"), 
        unitprice = c(1.425, 1.406, 1.011), brand = structure(c(2L, 
        1L, 3L), .Label = c("BP", "Shell", "Total"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -3L))
    

    我想从df2中提取compstcode、uprice和brand到df1,这样我们的compstcode和brand就不为空,而uprice对于df1的所有日期和时间都为空。

    1 回复  |  直到 6 年前
        1
  •  1
  •   Mako212    6 年前

    我认为您基本上希望基于多个条件连接这两个数据帧,但如果这不能捕获您要做的事情,请告诉我。

    library(tidyverse)
    
    df1 %>% left_join(., df2 %>% select(-date), by = c("StationCode", "ServiceType", "Product"))
    
                     date StationCode  ServiceType Product unitprice.x compstcode unitprice.y brand
    1 2017-06-01 06:00:00        1002 Self-Service      GG       1.345       3456       1.425 Shell
    2 2017-06-01 06:00:00        1002 Self-Service      GG       1.345       5488       1.011 Total
    3 2017-06-01 07:00:00        1002       Served      GG       1.456         NA          NA  <NA>
    4 2017-06-01 07:00:00        1002 Self-Service      SP       1.012         NA          NA  <NA>
    

    但您不必排除日期,它也可以是(为清晰起见,重命名列):

    df1 %>% left_join(., df2 %>% rename(compDate = date, compunitprice = unitprice), by = c("StationCode", "ServiceType", "Product"))