代码之家  ›  专栏  ›  技术社区  ›  bandcar

R:只保留值与另一列中的值不同的行

  •  1
  • bandcar  · 技术社区  · 2 年前

    我只想保留第1列中最后两个字母(州名缩写)与第3列中最后两个字母不同的行

      countyname            fipscounty   neighborname            fipsneighbor
    1 Archuleta County, CO  8007         Rio Grande County, CO   8105
    2 Archuleta County, CO  8007         Rio Arriba County, NM   35039
    3 Archuleta County, CO  8007         San Juan County, NM     35045
    

    在第一排,两个县都在科罗拉多州。在第2行和第3行中,第一个县位于CO,第二个县位于NM。我只想保留第2行和第3行,使其看起来像这样:

      countyname            fipscounty   neighborname            fipsneighbor
    2 Archuleta County, CO  8007         Rio Arriba County, NM   35039
    3 Archuleta County, CO  8007         San Juan County, NM     35045
    

    我该怎么做?

    1 回复  |  直到 2 年前
        1
  •  1
  •   AndrewGB    2 年前

    我们可以使用 str_sub ,并仅返回州缩写不匹配的行。

    library(tidyverse)
    
    df %>% 
      filter(str_sub(countyname, start= -2) != (str_sub(neighborname, start= -2)))
    

    输出

                countyname fipscounty          neighborname fipsneighbor
    1 Archuleta County, CO       8007 Rio Arriba County, NM        35039
    2 Archuleta County, CO       8007   San Juan County, NM        35045
    

    或者在base R中,我们可以使用 sub ,然后过滤数据帧。

    df[sub('.*(?=.{2}$)', '', df$countyname, perl=T) !=
         sub('.*(?=.{2}$)', '', df$neighborname, perl=T),]
    

    或者使用 substr (尽管要详细得多):

    df[substr(df$countyname, nchar(df$countyname)-1, nchar(df$countyname)) !=
    substr(df$neighborname, nchar(df$neighborname)-1, nchar(df$neighborname)),]
    

    数据

    df <- structure(list(countyname = c("Archuleta County, CO", "Archuleta County, CO", 
    "Archuleta County, CO"), fipscounty = c(8007L, 8007L, 8007L), 
        neighborname = c("Rio Grande County, CO", "Rio Arriba County, NM", 
        "San Juan County, NM"), fipsneighbor = c(8105L, 35039L, 35045L
        )), class = "data.frame", row.names = c(NA, -3L))