代码之家  ›  专栏  ›  技术社区  ›  shy zhan

如何改变数据。有很多列的表?

  •  -1
  • shy zhan  · 技术社区  · 6 年前

    我有一个数据。表中包括从P01到PP20以及从S01到S20的列。现在我想通过mutate函数再添加20列:

    library(dplyr)
    
    production50m <- mutate(production50m, S01_P01 = S01 / P01)
    production50m <- mutate(production50m, S02_P02 = S02 / P02)
    production50m <- mutate(production50m, S03_P03 = S03 / P03)
    production50m <- mutate(production50m, S04_P04 = S04 / P04)
    production50m <- mutate(production50m, S05_P05 = S05 / P05)
    production50m <- mutate(production50m, S06_P06 = S06 / P06)
    production50m <- mutate(production50m, S07_P07 = S07 / P07)
    production50m <- mutate(production50m, S08_P08 = S08 / P08)
    production50m <- mutate(production50m, S09_P09 = S09 / P09)
    production50m <- mutate(production50m, S10_P10 = S10 / P10)
    production50m <- mutate(production50m, S11_P11 = S11 / P11)
    production50m <- mutate(production50m, S12_P12 = S12 / P12)
    production50m <- mutate(production50m, S13_P13 = S13 / P13)
    production50m <- mutate(production50m, S14_P14 = S14 / P14)
    production50m <- mutate(production50m, S15_P15 = S15 / P15)
    production50m <- mutate(production50m, S16_P16 = S16 / P16)
    production50m <- mutate(production50m, S17_P17 = S17 / P17)
    production50m <- mutate(production50m, S18_P18 = S18 / P18)
    production50m <- mutate(production50m, S19_P19 = S19 / P19)
    production50m <- mutate(production50m, S20_P20 = S20 / P20)
    

    显然写20行代码不是明智之举。有什么办法吗? 提前感谢!

    3 回复  |  直到 6 年前
        1
  •  4
  •   www    6 年前

    使用base R的解决方案,可以避免键入 mutate 多次指挥。假设调用了数据帧 dat 40列,其中20列开头为 S 另一个呢 20 P dat2 是最终输出。

    # Select the columns from S01 to S20
    dat_S <- dat[, sprintf("S%02d", 1:20)]
    
    # Select the columns from P01 to P20
    dat_P <- dat[, sprintf("P%02d", 1:20)]
    
    # Calculate the new columns
    dat_SP <- dat_S/dat_P
    
    # Rename the columns
    names(dat_SP) <- paste(sprintf("S%02d", 1:20), sprintf("P%02d", 1:20), sep = "_")
    
    # Combine dat_SP to the original data frame
    dat2 <- cbind(dat, dat_SP)
    

    如果你真的在 data.table ,我们仍然可以使用相同的策略。请注意,我们按名称选择列的方式不同于常规数据框。

    library(data.table)
    
    # Convert to data.table
    setDT(dat)
    
    # Select the columns from S01 to S20
    S_cols <- sprintf("S%02d", 1:20)
    dat_S <- dat[, ..S_cols]
    
    # Select the columns from P01 to P20
    P_cols <- sprintf("P%02d", 1:20)
    dat_P <- dat[, ..P_cols]
    
    # Calculate the new columns
    dat_SP <- dat_S/dat_P
    
    # Rename the columns
    names(dat_SP) <- paste(sprintf("S%02d", 1:20), sprintf("P%02d", 1:20), sep = "_")
    
    # Combine dat_SP to the original data frame
    dat2 <- cbind(dat, dat_SP)
    

    数据

    set.seed(4749)
    
    dat <- as.data.frame(matrix(runif(120), ncol = 40))
    names(dat) <- c(sprintf("S%02d", 1:20), sprintf("P%02d", 1:20))
    
        2
  •  4
  •   Melissa Key    6 年前

    这是一种使用 dplyr ,则, purrr rlang

    library(dplyr)
    library(purrr)
    library(rlang)
    # list of the variables you want to combine
    library(stringr) # for str_pad function
    var_names <- map(c("S", "P"), ~ paste0(., str_pad(1:20, 2, side = 'left', pad = '0')))
    
    # create fake df since no data provided
    df <- unlist(var_names) %>% 
      map_dfc(.f = function(x) {
        data_frame(!!x := rnorm(100, 40, 2))
      })
    
    # solution - there are places this could be fancier, but this gets the job done
    df2 <- map2_dfc(var_names[[1]], var_names[[2]], .f = function(x, y) {
      var_name = paste(x, y, sep = "_")
      data_frame(!!var_name := df[[x]]/ df[[y]])
    }) %>%
      bind_cols(df, .)
    
        3
  •  0
  •   KamRa    6 年前

    您可以指向 N 第th列如下:

    myTable[,n]
    

    哪里 N 是列的编号。使用循环遍历要处理的列。例如:

    for(n in 1:ncol(myTable){
    myTable[,n] <- #put what you want the column to be here
    }
    

    不能以这种方式向表中添加新列。相反,您可以先将空白列添加到表中:

    myTable$name_of_new_column <- NA