我有两个数据集,它们的dput如下:
场景:DF1包含特定日期和时间的单价数据。DF2包含DF1中每个站点代码的竞争对手,以及价格和竞争对手品牌名称。DF2没有DF1中日期和时间的所有单价。我需要以某种方式将df2合并到df1中,这样我就可以获得日期时间、产品、服务类型和品牌,但竞争对手的单价不适用,这样我就可以在后面输入这些值。
我想做的是:
-
根据基于类似站点代码、产品和服务类型的DF1中的值,找出DF2中缺少的日期时间单价。DF1中的每个站点代码在DF2中都有多个“站点代码”,因为可以有多个竞争对手。
-
以某种方式提取和合并这些信息,以便我可以在之后输入值。
假设DF1是
date StationCode ServiceType Product unitprice
1 2017-06-01 06:00:00 1002 Self-Service GG 1.345
2 2017-06-01 07:00:00 1002 Served GG 1.456
3 2017-06-01 07:00:00 1002 Self-Service SP 1.012
dput(df1)
structure(list(date = structure(c(1496300400,
1496304000, 1496304000 ), class = c("POSIXct", "POSIXt"), tzone =
"Etc/GMT+1"), StationCode = c(1002, 1002, 1002), ServiceType =
structure(c(1L, 2L, 1L), .Label = c("Self-Service", "Served"), class
= "factor"), Product = structure(c(1L, 1L, 2L ), .Label = c("GG", "SP"), class = "factor"), unitprice = c(1.345,
1.456, 1.012)), class = "data.frame", row.names = c(NA, -3L))
DF2是
date compstcode StationCode ServiceType Product unitprice brand
1 2017-06-01 06:00:00 3456 1002 Self-Service GG 1.425 Shell
2 2017-06-01 06:00:00 1267 1002 Served SP 1.406 BP
3 2017-06-01 06:00:00 5488 1002 Self-Service GG 1.011 Total
dput(df2)
structure(list(date = structure(c(1496300400, 1496300400, 1496300400
), class = c("POSIXct", "POSIXt"), tzone = "Etc/GMT+1"), compstcode = c(3456,
1267, 5488), StationCode = c(1002, 1002, 1002), ServiceType = structure(c(1L,
2L, 1L), .Label = c("Self-Service", "Served"), class = "factor"),
Product = structure(c(1L, 2L, 1L), .Label = c("GG", "SP"), class = "factor"),
unitprice = c(1.425, 1.406, 1.011), brand = structure(c(2L,
1L, 3L), .Label = c("BP", "Shell", "Total"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
我想从df2中提取compstcode、uprice和brand到df1,这样我们的compstcode和brand就不为空,而uprice对于df1的所有日期和时间都为空。