代码之家  ›  专栏  ›  技术社区  ›  val

使用extract()将数据帧从宽格式改为长格式

  •  0
  • val  · 技术社区  · 7 年前

    我试图使用extract()在测量值旁边添加一个“Error”列。然而,我认为我对正则表达式和/或extract()语法感到厌烦。非常感谢您的帮助。

    理想情况下,我应该得到一个包含列的长格式

    Reading Category Measurement Error Sample
    

    可复制代码

    Reading <- c(1,2,3,4)
    Cat1 <- runif(4)*10
    Cat1_err <- runif(4)/10
    Cat2 <- runif(4)*10
    Cat2_err <- runif(4)/10
    Cat3 <- runif(4)*10
    Cat3_err <- runif(4)/10
    Sample <- c("X14","X23","X11","X10")
    df_wide <- data.frame(Reading,Cat1,Cat1_err,Cat2,Cat2_err,Cat3,Cat3_err,Sample)
    df_wide
      Reading     Cat1   Cat1_err     Cat2   Cat2_err     Cat3   Cat3_err Sample
    1       1 7.375116 0.01014747 2.234376 0.08978868 5.373709 0.02245759    X14
    2       2 5.097937 0.07036843 5.691806 0.05561866 1.823026 0.07658357    X23
    3       3 2.034116 0.01689391 8.192971 0.03844054 4.242167 0.01036751    X11
    4       4 9.129536 0.09130868 5.908125 0.05505775 5.747843 0.05774527    X10
    
    df_long <- df_wide %>% 
        +   gather(key=Category, value=Measurement, Cat1:Cat3_err, factor_key = TRUE) %>%
        +   extract(Measurement,c("Meas","Error"),"Cat\d_err", remove=FALSE)
    
    
        Error in names(l) <- enc2utf8(into) : 
      'names' attribute [2] must be the same length as the vector [0]
    
    2 回复  |  直到 7 年前
        1
  •  1
  •   www    7 年前

    我认为你不想使用 extract . 我想 separate spread

    library(tidyverse)
    
    df_long <- df_wide %>% 
      gather(key=Category, value=Measurement, Cat1:Cat3_err, factor_key = TRUE) %>%
      separate(Category, into = c("Category", "Type")) %>%
      mutate(Type = ifelse(is.na(Type), "Measurement", "Error")) %>%
      spread(Type, Measurement) %>%
      select(Reading, Category, Measurement, Error, Sample)
    df_long
       Reading Category Measurement       Error Sample
    1        1     Cat1   0.8453114 0.074961215    X14
    2        1     Cat2   4.5962112 0.059012908    X14
    3        1     Cat3   5.4100838 0.076049726    X14
    4        2     Cat1   4.5956145 0.016215603    X23
    5        2     Cat2   1.7768868 0.040258838    X23
    6        2     Cat3   1.9597101 0.027356213    X23
    7        3     Cat1   1.6204584 0.057760820    X11
    8        3     Cat2   4.9478913 0.054855327    X11
    9        3     Cat3   2.9670444 0.004276482    X11
    10       4     Cat1   0.1831593 0.038415489    X10
    11       4     Cat2   2.5716471 0.024932980    X10
    12       4     Cat3   8.5517659 0.015378512    X10
    
        2
  •  1
  •   David Klotz    7 年前

    也许有一种更快的方法可以做到这一点,但它似乎满足了您的需求:

    df_wide %>% 
      gather(key=Category, value=Measurement, Cat1:Cat3_err, factor_key = TRUE) %>%
      extract(Category,c("Meas","Error"),"(Cat\\d)[_]*([a-z]*)")  %>% 
      spread(key = Error, value = Measurement)
    

    注意,除其他事项外,需要使用 \\d 对于R中的正则表达式。