代码之家  ›  专栏  ›  技术社区  ›  Mislav

如果有连续3个单词的regex

  •  1
  • Mislav  · 技术社区  · 6 年前

    我正在寻找一个正则表达式,如果有连续提取3个单词。 例如,如果我有两个字符串:

    "1. Stack is great and awesome"
    "2. Stack"
    

    结果是:

    "Stack is great"
    "Stack" 
    

    这个答案不适合我: regex: matching 3 consecutive words

    我的努力:

    (?:[A-ZŠČĆŽa-zščćž]+ )(?:[A-ZŠČĆŽa-zščćž]+ )(?:[A-ZŠČĆŽa-zščćž]+ )
    
    1 回复  |  直到 6 年前
        1
  •  3
  •   Wiktor Stribiżew    6 年前

    > x <- c("1. Stack is great and awesome", "2. Stack")
    > regmatches(x, regexpr("[A-Za-z]+(?:\\s+[A-Za-z]+){0,2}", x))
    [1] "Stack is great" "Stack"
    ## Or to support all Unicode letters
    > y <- c("1. Stąck is great and awesome", "2. Stack")
    > regmatches(y, regexpr("\\p{L}+(?:\\s+\\p{L}+){0,2}", y, perl=TRUE))
    [1] "Stąck is great" "Stack"
    ## In some R environments, it makes sense to use another, TRE, regex:
    > regmatches(y, regexpr("[[:alpha:]]+(?:[[:space:]]+[[:alpha:]]+){0,2}", x))
    [1] "Stąck is great" "Stack"
    

    regex demo online R demo regex demo

    {0,2} {1,2}

    gregexpr regexpr

    • \\p{L}+ [A-Za-z]
    • (?:\\s+\\p{L}+){0,2} (?:\\s+[a-zA-Z]+){0,2}
      • \\s+

    perl=TRUE \p{L} (*UCP)

    stringr::str_extract stringr::str_extract_all

    > str_extract(x, "\\p{L}+(?:\\s+\\p{L}+){0,2}")
    [1] "Stack is great" "Stack"         
    > str_extract(x, "[a-zA-Z]+(?:\\s+[a-zA-Z]+){0,2}")
    [1] "Stack is great" "Stack"         
    > str_extract(x, "[[:alpha:]]+(?:\\s+[[:alpha:]]+){0,2}")
    [1] "Stack is great" "Stack" 
    

    stringr

    > str_extract(y, "\\p{L}+(?:\\s+\\p{L}+){0,2}")
    [1] "Stąck iç great" "Stack"         
    > str_extract(y, "[[:alpha:]]+(?:\\s+[[:alpha:]]+){0,2}")
    [1] "Stąck iç great" "Stack"