代码之家  ›  专栏  ›  技术社区  ›  neuron

如何将一行作为列名,并将一个字符串拆分为多行

r
  •  0
  • neuron  · 技术社区  · 6 年前

    我不知道如何说出问题的标题,所以我尽力了。我将给出我的数据集的一个例子。我们可以调用数据集 my_data

    tibble::tribble(
      ~Pathway, ~log_value, ~ratio, ~z_score,                ~molecules,
         "GHR",      "N/A",  "N/A",    "N/A", "CD40LG,TGFBR1,MYH9,MMP1",
        "TGFB",      "N/A",  "N/A",    "N/A", "ADAMTS8,PIK3R1,HRAS,SEM",
         "PKA",      "N/A",  "N/A",    "N/A", "PIK3CA,PDGFA,PIK3R1,SPH",
         "PKB",      "N/A",  "N/A",    "N/A", "MAST2,PIK3CA,TGFBR1,BAD",
         "PKC",      "N/A",  "N/A",    "N/A", "TGFBR1,AKAP9,CAMK2A,PHK"
      )
    

    所以我要做的是将列1变成一行,并将其作为每行的名称。我还想将第5列拆分为多行。所以这就是我所设想的。

    GHR TGFB PKA PKB PKC
    CD40LG ADAMTS8 PIK3CA MAST2 TGFBR1
    TGFBR1 PIK3R1 PDGFA PIK3CA AKAP9
    MYH9 HRAS PIK3R1 TGFBR1 CAMK2A
    MMP1 SEM SPH BAD PHK
    

    所以我真的不需要列2、3或4,所以我用 my_data <- my_data[c(1,5)] 我用 my_data$molecules <- as.character(gsub(","," ",my_data$molecules)) 我有问题,但也许你不需要用那个。所以我只想将列1作为行名,并将列5拆分为多行,但我很难做到这一点。有人有什么建议吗?事先谢谢。

    3 回复  |  直到 6 年前
        1
  •  1
  •   phil_t    6 年前

    df = df[, c(1, 5)]
    
    ## Split on comma and add to dataframe
    tmp = strsplit(df$molecules, ",")
    df = cbind(df[, -2], do.call(rbind, tmp))
    
    ## Transpose the dataframe
    df = t(df)
    rownames(df) = NULL
    
        2
  •  1
  •   dmi3kno    6 年前

    df <- tibble::tribble(
          ~Pathway, ~log_value, ~ratio, ~z_score,                ~molecules,
             "GHR",      "N/A",  "N/A",    "N/A", "CD40LG,TGFBR1,MYH9,MMP1",
            "TGFB",      "N/A",  "N/A",    "N/A", "ADAMTS8,PIK3R1,HRAS,SEM",
             "PKA",      "N/A",  "N/A",    "N/A", "PIK3CA,PDGFA,PIK3R1,SPH",
             "PKB",      "N/A",  "N/A",    "N/A", "MAST2,PIK3CA,TGFBR1,BAD",
             "PKC",      "N/A",  "N/A",    "N/A", "TGFBR1,AKAP9,CAMK2A,PHK"
          )
    

    dplyr tidyr

    df %>% select(Pathway, molecules) %>% 
      separate_rows(molecules,sep=",") %>% 
      group_by(Pathway) %>% 
      mutate(id=1:n()) %>% 
      spread(key="Pathway", value="molecules") %>% 
      select(-id)
    
    #> # A tibble: 4 x 5
    #>   GHR    PKA    PKB    PKC    TGFB   
    #>   <chr>  <chr>  <chr>  <chr>  <chr>  
    #> 1 CD40LG PIK3CA MAST2  TGFBR1 ADAMTS8
    #> 2 TGFBR1 PDGFA  PIK3CA AKAP9  PIK3R1 
    #> 3 MYH9   PIK3R1 TGFBR1 CAMK2A HRAS   
    #> 4 MMP1   SPH    BAD    PHK    SEM    
    

    select spread id

        3
  •  1
  •   Onyambu    6 年前
     dat=read.table(strings=F,text="Pathway log_value ratio z_score molecules
      GHR N/A N/A N/A CD40LG,TGFBR1,MYH9,MMP1…
                TGFB N/A N/A N/A ADAMTS8,PIK3R1,HRAS,SEM…
                PKA N/A N/A N/A PIK3CA,PDGFA,PIK3R1,SPH…
                PKB N/A N/A N/A MAST2,PIK3CA,TGFBR1,BAD…
                PKC N/A N/A N/A TGFBR1,AKAP9,CAMK2A,PHK…",na.string="N/A",h=T)
    
    
     a = data.frame(t(read.table(text=dat$molecules,sep=",")),stringsAsFactors = F)
    
     setNames(a,dat$Pathway)
    
          GHR    TGFB    PKA    PKB    PKC
    V1 CD40LG ADAMTS8 PIK3CA  MAST2 TGFBR1
    V2 TGFBR1  PIK3R1  PDGFA PIK3CA  AKAP9
    V3   MYH9    HRAS PIK3R1 TGFBR1 CAMK2A
    V4  MMP1…    SEM…   SPH…   BAD…   PHK…