代码之家  ›  专栏  ›  技术社区  ›  Hadsga

基于另外两列的值替换值

r
  •  0
  • Hadsga  · 技术社区  · 1 年前

    我有这3栏。“Basiswert”中有一些NA值。如果“Basiwert”中有NA值,我想用相同数字的Basiswert形式替换它,该数字出现在“D”或“E”列中。例如:在第2行中,Basiswert中的Value中有一个NA。该值必须替换为E列第3行的USDCAD=12442。

    structure(list(D = c("12449", "12448", "12447", "12446", "12442", 
    "12441", "12440", "12439", "12438", "12437"), E = c("0", "12442", 
    "12442", "12430", "0", "12430", "12436", "12436", "12430", "12430"
    ), Basiswert = c("EURJPY", NA, "USDCAD", "EURAUD", "USDCAD", 
    "EURAUD", NA, "GBPJPY", NA, "EURAUD")), class = "data.frame", row.names = c(NA, 
    -10L))
    

    编辑:

    library(dplyr)
    
    data %>% mutate(Basiswert = ifelse(is.na(Basiswert) == FALSE & 
    D == E, Basiswert, NA))
    

    不起作用。

    2 回复  |  直到 1 年前
        1
  •  3
  •   Hann Shaw    1 年前

    不太清楚这个问题。检查这是否有效。

    data %>%
      group_by(E) %>%
      mutate(Basiswert2 = ifelse(is.na(Basiswert), max(Basiswert, na.rm = T), Basiswert))
    
        2
  •  1
  •   hello_friend    1 年前

    基本R解决方案(在样本数据中,仅重复了E的值,因此我们可以忽略D):

    # Split-apply-combine by column E values: 
    # res_df => data.frame
    res_df <- data.frame(
      do.call(
        rbind,
        # For each E value: 
        lapply(
          with(df, split(df, E)),
          function(x){
            # If the first value in the vector is na:
            if(is.na(x$Basiswert[1])){
              # Resolve the first non na value: ir => vector length 1
              ir <- x$Basiswert[min(which(!(is.na(x$Basiswert))))]
              # Fill the first na value with first non-na value: 
              x$Basiswert[1] <- ir
            # Otherwise: 
            }else{
              # Do nothing: 
              invisible()
            } 
            # Fill the values down: x$Basiswert => vector
            x$Basiswert <- na.omit(x$Basiswert)[cumsum(!(is.na(x$Basiswert)))]
            # Return the data.frame: data.frame => env
            x
          }
        )
      ),
      row.names = NULL
    )
    

    考虑D的Tidyverse解决方案,如果还需要考虑D:

    library(dplyr)
    library(tidyr)
    df %>%
      group_by(E) %>%
      fill(Basiswert, .direction = "downup") %>% 
      group_by(D) %>% 
      fill(Basiswert, .direction = "downup")
    

    输入数据:

    df <- structure(
      list(
        D = c("12449", "12448", "12447", "12446", "12442", 
        "12441", "12440", "12439", "12438", "12437"), 
        E = c("0", "12442", 
        "12442", "12430", "0", "12430", "12436", "12436", "12430", "12430"
        ), 
        Basiswert = c("EURJPY", NA, "USDCAD", "EURAUD", "USDCAD", "EURAUD", NA, "GBPJPY", NA, "EURAUD")
      ), 
      class = "data.frame", 
      row.names = c(NA, -10L)
    )
    
        3
  •  1
  •   r2evans    1 年前

    另一个基本R解决方案:

    data$Basiswert2 <- ave(data$Basiswert, data$E, FUN = function(z) replace(z, is.na(z), max(z, na.rm = TRUE)))
    data
    #        D     E Basiswert Basiswert2
    # 1  12449     0    EURJPY     EURJPY
    # 2  12448 12442      <NA>     USDCAD
    # 3  12447 12442    USDCAD     USDCAD
    # 4  12446 12430    EURAUD     EURAUD
    # 5  12442     0    USDCAD     USDCAD
    # 6  12441 12430    EURAUD     EURAUD
    # 7  12440 12436      <NA>     GBPJPY
    # 8  12439 12436    GBPJPY     GBPJPY
    # 9  12438 12430      <NA>     EURAUD
    # 10 12437 12430    EURAUD     EURAUD
    

    既然你说你想在里面分组 D E ,我建议这可以是一条链:

    data |>
      transform(Basiswert = ave(Basiswert, D, FUN = function(z) ifelse(is.na(z), na.omit(z)[1], z))) |>
      transform(Basiswert = ave(Basiswert, E, FUN = function(z) ifelse(is.na(z), na.omit(z)[1], z)))
    #        D     E Basiswert
    # 1  12449     0    EURJPY
    # 2  12448 12442    USDCAD
    # 3  12447 12442    USDCAD
    # 4  12446 12430    EURAUD
    # 5  12442     0    USDCAD
    # 6  12441 12430    EURAUD
    # 7  12440 12436    GBPJPY
    # 8  12439 12436    GBPJPY
    # 9  12438 12430    EURAUD
    # 10 12437 12430    EURAUD
    

    二者都 ifelse(is.na(z), z, ..) replace(z, is.na(z), ..) 在这里实际上是一样的。。。我回避的唯一原因 ifelse 如果您的数据不是int/num/chr。。。例如,如果您在日期或时间戳上使用此逻辑,那么 如果其他 should not be used .

    的使用 max(..) 用于确定所述替换值是对字符串的破解的一点;它在某些情况下会发出警告(在这里对我来说确实如此),而的使用 na.omit(z)[1] 总是 返回第一个非- NA 值,或 NA 如果不存在的话。

        4
  •  0
  •   TarJae    1 年前

    与@Hann Shaw的逻辑相同,我们也可以使用 replace :

    library(dplyr)
    
    df %>%
      mutate(Basiswert2 = replace(Basiswert, is.na(Basiswert), max(Basiswert, na.rm = TRUE)), .by=E)
    
         D     E Basiswert Basiswert2
    1  12449     0    EURJPY     EURJPY
    2  12448 12442      <NA>     USDCAD
    3  12447 12442    USDCAD     USDCAD
    4  12446 12430    EURAUD     EURAUD
    5  12442     0    USDCAD     USDCAD
    6  12441 12430    EURAUD     EURAUD
    7  12440 12436      <NA>     GBPJPY
    8  12439 12436    GBPJPY     GBPJPY
    9  12438 12430      <NA>     EURAUD
    10 12437 12430    EURAUD     EURAUD