代码之家  ›  专栏  ›  技术社区  ›  Gautam

选择不包含特定模式R regex的字符串

  •  0
  • Gautam  · 技术社区  · 6 年前

    我有一组文件,我想从中选择那些不包含术语“dataset”或“eff”的文件。

    数据

    k <- c("Duct1/X SN5 F9MH.csv", "Duct1/X SN5 F9MH_dataset.csv", "Duct1/X SN5 F9MH_eff.csv", 
    "Duct2/X F7 X300 E10.csv", "Duct2/X F7 X300 E10_dataset.csv", 
    "Duct2/X F7 X300 E10_eff.csv", "Duct3/X600 F8 X600 E10.csv", 
    "Duct3/X600 F8 X600 E10_dataset.csv", "Duct3/X600 F8 X600 E10_eff.csv", 
    "Duct4/X F7 X600 E10.csv", "Duct4/X F7 X600 E10_dataset.csv", 
    "Duct4/X F7 X600 E10_eff.csv")
    

    据我所知,我可以用 [^...] 排除某些字符(用 ... )从结果来看。

    试着用这个 N :

    # Looking for N works 
    > grep('.*[N].*', k, value = T)
    [1] "Duct1/X SN5 F9MH.csv"         "Duct1/X SN5 F9MH_dataset.csv" "Duct1/X SN5 F9MH_eff.csv"    
    
    # Looking for strings not containing N does not work 
    > grep('.*[!N].*', k, value = T)
    [1] "Duct1/X SN5 F9MH.csv"         "Duct1/X SN5 F9MH_dataset.csv" "Duct1/X SN5 F9MH_eff.csv"    
    
    # Trying with ^ also does not work 
    > grep('.*[^N].*', k, value = T)
     [1] "Duct1/X SN5 F9MH.csv"               "Duct1/X SN5 F9MH_dataset.csv"       "Duct1/X SN5 F9MH_eff.csv"          
     [4] "Duct2/X F7 X300 E10.csv"            "Duct2/X F7 X300 E10_dataset.csv"    "Duct2/X F7 X300 E10_eff.csv"       
     [7] "Duct3/X600 F8 X600 E10.csv"         "Duct3/X600 F8 X600 E10_dataset.csv" "Duct3/X600 F8 X600 E10_eff.csv"    
    [10] "Duct4/X F7 X600 E10.csv"            "Duct4/X F7 X600 E10_dataset.csv"    "Duct4/X F7 X600 E10_eff.csv" 
    

    我可以得到结果 grepl 并使用它来子集字符向量:

    > k[!grepl(pattern = 'N', x = k)]
    [1] "Duct2/X F7 X300 E10.csv"            "Duct2/X F7 X300 E10_dataset.csv"    "Duct2/X F7 X300 E10_eff.csv"       
    [4] "Duct3/X600 F8 X600 E10.csv"         "Duct3/X600 F8 X600 E10_dataset.csv" "Duct3/X600 F8 X600 E10_eff.csv"    
    [7] "Duct4/X F7 X600 E10.csv"            "Duct4/X F7 X600 E10_dataset.csv"    "Duct4/X F7 X600 E10_eff.csv" 
    

    对于我的实际用例( dataset|eff ):

    > k[!grepl(pattern = 'eff|dataset', x = k)]
    [1] "Duct1/X SN5 F9MH.csv"       "Duct2/X F7 X300 E10.csv"    "Duct3/X600 F8 X600 E10.csv"
    [4] "Duct4/X F7 X600 E10.csv"   
    

    但我在寻找一种方法 grep(... , value = T) 因为我不想存储字符向量( k

    1 回复  |  直到 6 年前
        1
  •  1
  •   Onyambu    6 年前
    grep('N',k,value = T,invert = T)
    [1] "Duct2/X F7 X300 E10.csv"           
    [2] "Duct2/X F7 X300 E10_dataset.csv"   
    [3] "Duct2/X F7 X300 E10_eff.csv"       
    [4] "Duct3/X600 F8 X600 E10.csv"        
    [5] "Duct3/X600 F8 X600 E10_dataset.csv"
    [6] "Duct3/X600 F8 X600 E10_eff.csv"    
    [7] "Duct4/X F7 X600 E10.csv"           
    [8] "Duct4/X F7 X600 E10_dataset.csv"   
    [9] "Duct4/X F7 X600 E10_eff.csv"
    

    grep('eff|dataset', k, invert = TRUE, value = TRUE)