代码之家  ›  专栏  ›  技术社区  ›  Akki

基于日期范围合并多个数据帧

  •  2
  • Akki  · 技术社区  · 7 年前
    • 我有这些数据

      id<-c("1","3")
      Outcome<-c("Balanced","Balanced")  
      FromDate<-as.Date(c("2016-01-01","2016-01-01"),'%Y-%m-%d')
      ToDate<-as.Date(c("2017-01-01","2017-01-01"),'%Y-%m-%d')
      type<-c("ccc")
      data<-data.frame(id,Outcome,FromDate,ToDate,type) 
      R> data
      
         id  Outcome    FromDate     ToDate     type  
      
         1   Balanced   2016-01-01   2017-01-01  ccc  
         3   Balanced   2016-01-01   2017-01-01  ccc
      
      refno<-c("1","2","1","1")
      sedolnumber<-c("ABC123","XYZ12","ABC123","ZZZ123") 
      order_placement_date<-as.Date(c("2016-02-01","2017-02-05","2017-02-01","2016-04-01"),'%Y-%m-%d')
      units_buyed<-c("1000","200","1000","1000")
      buy<-data.frame(refno,sedolnumber,order_placement_date,units_buyed)
      
      R> buy   
      refno sedolnumber order_placement_date units_buyed
         1      ABC123           2016-02-01        1000
         2       XYZ12           2017-02-05         200
         1      ABC123           2017-02-01        1000
         1      ZZZ123           2016-04-01        1000
      
      refno<-c("1","1")
      sedolnumber<-c("ABC123","ABC123") 
      sell_placement_date<-as.Date(c("2016-05-01","2017-05-01"),'%Y-%m-%d')
      units_sold<-c("500","500")
      sell<-data.frame(refno,sedolnumber,sell_placement_date,units_sold)
      
      R> sell   
      refno sedolnumber sell_placement_date units_sold
        1      ABC123          2016-05-01        500  
        1      ABC123          2017-05-01        500
      
    • 我想根据条件连接所有三个表,并再添加一列 保留的单位 这将是购买列的减法。units\u买卖。售出单位:

      1. 数据id=购买。参考编号
      2. 购买订单放置日期>=数据FromDate AND buy。订单放置日期<数据ToDate公司
      3. 数据id=销售。参考编号
      4. 购买sedolnumber=卖出。塞多尔数
    • 保留的单位列应显示 购买的单位 如果售出的units\u为空,并且 如果units\u buyed和units\u Seld均为空

    • 我使用sqldf包来实现这一点。R中是否有任何函数可以在不使用sqldf的情况下实现这一点。 数据 表是我的父表,如果未找到买卖表中的匹配记录,则应显示空白值。

      R>sqldf("SELECT a.id,a.outcome,a.FromDate,a.ToDate,a.type,b.sedolnumber,b.order_placement_date,b.units_buyed,c.units_sold,c.sell_placement_date,(b.units_buyed-c.units_sold) as Units_Retained 
       FROM data a LEFT JOIN buy b ON (a.id=b.refno AND b.order_placement_date>=a.FromDate AND b.order_placement_date<a.ToDate) 
       LEFT JOIN sell c ON(a.id=c.refno AND c.sell_placement_date>=a.FromDate AND c.sell_placement_date<a.ToDate AND b.sedolnumber=c.sedolnumber)  ") 
      
      
      
      R>  id  Outcome   FromDate     ToDate   type sedolnumber order_placement_date units_buyed units_sold sell_placement_date Units_Retained
          1    Balanced 2016-01-01 2017-01-01  ccc      ABC123           2016-02-01        1000        500          2016-05-01            500
          1    Balanced 2016-01-01 2017-01-01  ccc      ZZZ123           2016-04-01        1000       <NA>                <NA>             NA
          3    Balanced 2016-01-01 2017-01-01  ccc        <NA>                 <NA>        <NA>       <NA>                <NA>             NA
      
    2 回复  |  直到 7 年前
        1
  •  3
  •   amarchin    7 年前

    您可以使用 dplyr 动词

    library(dplyr)
    
    get_units_retained <- function(units_buyed, units_sold) {
    
      units_buyed <- as.numeric(as.character(units_buyed))
      units_sold <- as.numeric(as.character(units_sold))
    
      if_else(is.na(units_buyed), 0, units_buyed) - if_else(is.na(units_sold), 0, units_sold)
    
    }
    
    
    left_join(data, buy, by = c("id" = "refno")) %>% 
      left_join(sell, by = c("id" = "refno", "sedolnumber")) %>% 
      filter(
          (order_placement_date >= FromDate & order_placement_date < ToDate) | is.na(order_placement_date), 
          (sell_placement_date >= FromDate & sell_placement_date < ToDate) | is.na(sell_placement_date)
      ) %>% 
      mutate(Units_Retained = get_units_retained(units_buyed, units_sold))
    
        2
  •  0
  •   Alex    7 年前

    您可以使用 data.table 如果您使用的是非常大的数据:

    require(data.table)
    data <- as.data.table(data)
    buy <- as.data.table(buy)
    sell <- as.data.table(sell)
    setkey(data,id)
    setkey(buy,refno)
    setkey(sell,refno,sedolnumber)
    dd <- setkey(data[buy,nomatch = 0],
           id,
           sedolnumber
           )[
             sell,nomatch = 0
             ][
               order_placement_date >= FromDate &  order_placement_date < ToDate &
                 sell_placement_date >= FromDate & sell_placement_date < ToDate,
               ][,
                 Units_Retained := as.numeric(as.character(units_buyed)) - as.numeric(as.character(units_sold))
                 ]