代码之家  ›  专栏  ›  技术社区  ›  user28298

R-如何拆分/组合多个变量的列[重复]

  •  0
  • user28298  · 技术社区  · 6 年前

    我是R方面的新手,我还没有找到一个答案,即如何将一个包含多个变量的列(示例1-4)拆分为单独的列,同时移动与之相关的数据。下面是一个示例:

    Samples     Content
    Sample 1    70.7
    Sample 1    91.6
    Sample 1    92.6
    Sample 1    65.2
    Sample 1    80.0
    Sample 1    82.1
    Sample 1    88.1
    Sample 1    92.2
    Sample 1    53.3
    Sample 1    80.0
    Sample 1    60.3
    Sample 1    89.7
    Sample 1    84.8
    Sample 1    94.0
    Sample 1    71.8
    Sample 1    76.9
    Sample 1    91.4
    Sample 1    57.9
    Sample 1    61.9
    Sample 1    71.5
    Sample 2    88.7
    Sample 2    67.6
    Sample 2    61.7
    Sample 2    70.8
    Sample 2    45.3
    Sample 2    55.6
    Sample 2    64.6
    Sample 2    62.7
    Sample 2    72.4
    Sample 2    46.8
    Sample 2    59.0
    Sample 2    63.7
    Sample 2    67.0
    Sample 2    71.6
    Sample 2    48.3
    Sample 2    55.6
    Sample 2    62.5
    Sample 2    60.0
    Sample 2    72.9
    Sample 2    47.4
    Sample 3    42.3
    Sample 3    48.2
    Sample 3    64.0
    Sample 3    33.3
    Sample 3    19.0
    Sample 3    41.0
    Sample 3    53.1
    Sample 3    46.5
    Sample 3    30.0
    Sample 3    43.4
    Sample 3    43.7
    Sample 3    92.0
    Sample 3    53.0
    Sample 3    33.0
    Sample 3    48.4
    Sample 3    43.2
    Sample 3    41.8
    Sample 3    62.5
    Sample 3    33.3
    Sample 3    49.3
    Sample 4    51.8
    Sample 4    57.3
    Sample 4    43.3
    Sample 4    42.3
    Sample 4    37.6
    Sample 4    54.9
    Sample 4    71.1
    Sample 4    33.8
    Sample 4    43.1
    Sample 4    39.1
    Sample 4    63.0
    Sample 4    74.0
    Sample 4    31.0
    Sample 4    48.3
    Sample 4    42.9
    Sample 4    62.2
    Sample 4    35.4
    Sample 4    33.8
    Sample 4    40.7
    Sample 4    41.2
    

    我试了三天,但没有成功。我希望输出是这样的;

    Sample 1    Sample 2    Sample 3    Sample 4
    70.7    88.7    42.3    51.8
    91.6    67.6    48.2    57.3
    92.6    61.7    64.0    43.3
    65.2    70.8    33.3    42.3
    80.0    45.3    19.0    37.6
    82.1    55.6    41.0    54.9
    88.1    64.6    53.1    71.1
    92.2    62.7    46.5    33.8
    53.3    72.4    30.0    43.1
    80.0    46.8    43.4    39.1
    60.3    59.0    43.7    63.0
    89.7    63.7    92.0    74.0
    84.8    67.0    53.0    31.0
    94.0    71.6    33.0    48.3
    71.8    48.3    48.4    42.9
    76.9    55.6    43.2    62.2
    91.4    62.5    41.8    35.4
    57.9    60.0    62.5    33.8
    61.9    72.9    33.3    40.7
    71.5    47.4    49.3    41.2
    

    非常感谢,如果确定了解决方案,如果我想做回报,是否有答案?

    额外-是否有任何方法可以对堆叠在一列(如第一个示例)中的数据进行t检验,而无需对其进行转换?

    2 回复  |  直到 6 年前
        1
  •  2
  •   neilfws    6 年前
    1. 您可能有“重复标识符”问题,使用 tidyr::spread . 您首先需要生成Sample+identifier的唯一组合,您可以这样做(假设数据帧名为 df1 ):

      library(tidyverse) # for dplyr + tidyr
      df1 %>% 
        group_by(Samples) %>% 
        mutate(id = row_number()) %>% 
        spread(Samples, Content) %>%
        select(-id)
      
    2. “如果我想做回报”

    你是说换一种方式,从宽型回到原来的长型?然后使用 gather . 将此添加到上面代码的末尾,然后查看发生了什么:

    %>% gather(Samples, Content)
    
    1. t-test:有很多方法可以对长格式数据运行t-test。例如,比较样本1和2的基本R方法可能是:

      t.test(df1[df1$Samples == "Sample 1", "Content"], 
             df1[df1$Samples == "Sample 2", "Content"])
      
        2
  •  1
  •   akrun    6 年前

    由于每个“样本”的元素数量相同,我们可以使用 unstack 从…起 base R

    unstack(df1, Content~Samples)
    #    Sample.1 Sample.2 Sample.3 Sample.4
    #1      70.7     88.7     42.3     51.8
    #2      91.6     67.6     48.2     57.3
    #3      92.6     61.7     64.0     43.3
    #4      65.2     70.8     33.3     42.3
    #5      80.0     45.3     19.0     37.6
    #6      82.1     55.6     41.0     54.9
    #7      88.1     64.6     53.1     71.1
    #8      92.2     62.7     46.5     33.8
    #9      53.3     72.4     30.0     43.1
    #10     80.0     46.8     43.4     39.1
    #11     60.3     59.0     43.7     63.0
    #12     89.7     63.7     92.0     74.0
    #13     84.8     67.0     53.0     31.0
    #14     94.0     71.6     33.0     48.3
    #15     71.8     48.3     48.4     42.9
    #16     76.9     55.6     43.2     62.2
    #17     91.4     62.5     41.8     35.4
    #18     57.9     60.0     62.5     33.8
    #19     61.9     72.9     33.3     40.7
    #20     71.5     47.4     49.3     41.2
    

    未使用外部软件包


    如果“样本”元素的数量不同,则 dcast 从…起 data.table 可以使用(在两种情况下都适用)

    library(data.table)
    dcast(setDT(df1), rowid(Samples)~Samples, value.var = "Content")