代码之家  ›  专栏  ›  技术社区  ›  Tim Wilcox

如何最好地加入R?

  •  1
  • Tim Wilcox  · 技术社区  · 2 年前

    以下是示例数据。我知道我必须做左联合。问题是如何让它只返回匹配(indcodelist=indcodelist2)但具有最高代码类型值的值。

     indcodelist <- c(110000,111000,112000,113000,114000,115000,121000,210000,211000,315000)
     estemp <- c(11,21,31,41,51,61,55,21,22,874)
     projemp <- c(15,25,36,45,52,61,31,29,31,899)
     nchg <- c(4,4,5,4,1,0,-24,8,9,25)
    
    
     firsttable <- data.frame(indcodelist,estemp,projemp,nchg)
    
    
      indcodelist2 <- c(110000,111000,112000,113000,114000,115000,121000,210000,211000,315000,110000,111000,112000,113000)
      codetype <- c(18,18,18,18,18,18,18,18,18,18,10,10,10,10)
      codetitle <- c("Accountant","Doctor","Lawyer","Teacher","Economist","Financial Analyst","Meteorologist","Dentist", "Editor","Veterinarian","Accounting Technician","Doctor","Lawyer","Teacher")
      secondtable <- data.frame(indcodelist2,codetype,codetitle)
    
    
      tried <- left_join(firsttable,secondtable, by =c(indcodelist = "indcodelist2"))
    
      Desired Result
    
      indcodelist       estemp     projemp      nchg        codetitle   
    
      110000              11          15           4           Accountant
      111000              21          25           4           Doctor
    
    1 回复  |  直到 2 年前
        1
  •  1
  •   TrainingPizza    2 年前

    如果只希望两个表中的值匹配, inner_join 可能正是你想要的。你可以看到 this answer 以了解不同类型的联接。

    获得最高 codetype ,您可以使用 dplyr::slice_max() 。请注意,默认行为是返回相等的值。如果不止一个 codetitle 同时 代码类型 ,它们都会被退回。

    library(tidyverse)
    
    firsttable %>%
      inner_join(., secondtable, by = c("indcodelist" = "indcodelist2")) %>%
      group_by(indcodelist) %>%
      slice_max(codetype)
    #> # A tibble: 10 × 6
    #> # Groups:   indcodelist [10]
    #>    indcodelist estemp projemp  nchg codetype codetitle        
    #>          <dbl>  <dbl>   <dbl> <dbl>    <dbl> <chr>            
    #>  1      110000     11      15     4       18 Accountant       
    #>  2      111000     21      25     4       18 Doctor           
    #>  3      112000     31      36     5       18 Lawyer           
    #>  4      113000     41      45     4       18 Teacher          
    #>  5      114000     51      52     1       18 Economist        
    #>  6      115000     61      61     0       18 Financial Analyst
    #>  7      121000     55      31   -24       18 Meteorologist    
    #>  8      210000     21      29     8       18 Dentist          
    #>  9      211000     22      31     9       18 Editor           
    #> 10      315000    874     899    25       18 Veterinarian
    

    创建于2022-09-15由 reprex package (v2.0.1)

        2
  •  0
  •   moodymudskipper    2 年前

    您可以使用{powerjoin}:

    library(powerjoin)
    power_inner_join(
      firsttable, 
      secondtable |> summarize_by_keys(dplyr::across()[which.max(codetype),]), 
      by = c("indcodelist" = "indcodelist2")
    )
    #>    indcodelist estemp projemp nchg codetype         codetitle
    #> 1       110000     11      15    4       18        Accountant
    #> 2       111000     21      25    4       18            Doctor
    #> 3       112000     31      36    5       18            Lawyer
    #> 4       113000     41      45    4       18           Teacher
    #> 5       114000     51      52    1       18         Economist
    #> 6       115000     61      61    0       18 Financial Analyst
    #> 7       121000     55      31  -24       18     Meteorologist
    #> 8       210000     21      29    8       18           Dentist
    #> 9       211000     22      31    9       18            Editor
    #> 10      315000    874     899   25       18      Veterinarian