代码之家  ›  专栏  ›  技术社区  ›  Fabio Favoretto

在R中匹配不同数据帧中的站点

  •  1
  • Fabio Favoretto  · 技术社区  · 7 年前

    我有多个数据帧,看起来像这样: d1:

    Year   Region  Sites Depth Transect Pharia pyramidatus
    2000   LP     BALLENA      5        1        0.03
    2000   LP     ISLOTES      5        1        0.20
    2000   LP     NORTE        5        1        0.10
    2000   LP     NORTE       20        1        0.00
    

    Year   Region  Sites      Depth Transect Pharia pyramidatus
    2010   LP     PLAYA        5        1        0.03
    2010   LP     ISLOTES      5        1        0.20
    2010   LP     NORTE        5        1        0.10
    2010   LP     NORTE       20        1        0.00
    

    d3

    Year   Region  Sites      Depth Transect Pharia pyramidatus
    2016   LP     BALLENA      5        1        0.03
    2016   LP     ISLOTES      5        1        0.20
    2016   LP     SUR          5        1        0.10
    2016   LP     NORTE       20        1        0.00
    

    我想做的是提取sames站点( Reef 每个

    Year   Region  Reef      Depth Transect Pharia pyramidatus
    2000   LP     ISLOTES      5        1        0.20
    2000   LP     NORTE        5        1        0.10
    2000   LP     NORTE       20        1        0.00
    2010   LP     ISLOTES      5        1        0.20
    2010   LP     NORTE        5        1        0.10
    2010   LP     NORTE       20        1        0.00
    2016   LP     ISLOTES      5        1        0.20
    2016   LP     NORTE        20       1        0.00
    

    非常感谢你的帮助

    1 回复  |  直到 7 年前
        1
  •  1
  •   acylam    7 年前

    解决方案 dplyr :

    library(dplyr)
    rbind(df1, df2, df3) %>%
      group_by(Reef) %>%
      filter(n_distinct(Year) == 3)
    

    结果:

    # A tibble: 8 x 6
    # Groups:   Reef [2]
       Year Region    Reef Depth Transect Pharia_pyramidatus
      <int> <fctr>  <fctr> <int>    <int>              <dbl>
    1  2000     LP ISLOTES     5        1                0.2
    2  2000     LP   NORTE     5        1                0.1
    3  2000     LP   NORTE    20        1                0.0
    4  2010     LP ISLOTES     5        1                0.2
    5  2010     LP   NORTE     5        1                0.1
    6  2010     LP   NORTE    20        1                0.0
    7  2016     LP ISLOTES     5        1                0.2
    8  2016     LP   NORTE    20        1                0.0
    

    笔记:

    n_distinct 统计不同 Year 对于每个 Reef (自I group_by(Reef) ). 我想要 distinct_n == 3 因为我只想返回 暗礁 每个都有记录 ,在这种情况下为3年。在更一般的情况下,有更多 的,您可能希望首先找到 filter 基于此,如下所示:

    rbind(df1, df2, df3) %>%
      mutate(Year_distinct = n_distinct(Year)) %>%
      group_by(Reef) %>%
      filter(n_distinct(Year) == Year_distinct) %>%
      select(-Year_distinct)
    

    数据:

    df1 = read.table(text = "Year   Region  Reef      Depth Transect Pharia_pyramidatus
                     2000   LP     BALLENA      5        1        0.03
                     2000   LP     ISLOTES      5        1        0.20
                     2000   LP     NORTE        5        1        0.10
                     2000   LP     NORTE       20        1        0.00", header = TRUE)
    
    df2 = read.table(text = "Year   Region  Reef      Depth Transect Pharia_pyramidatus
                     2010   LP     PLAYA        5        1        0.03
                     2010   LP     ISLOTES      5        1        0.20
                     2010   LP     NORTE        5        1        0.10
                     2010   LP     NORTE       20        1        0.00", header = TRUE)
    
    df3 = read.table(text = "Year   Region  Reef      Depth Transect Pharia_pyramidatus
                     2016   LP     BALLENA      5        1        0.03
                     2016   LP     ISLOTES      5        1        0.20
                     2016   LP     SUR          5        1        0.10
                     2016   LP     NORTE         20        1        0.00", header = TRUE)