代码之家 › 专栏 › 技术社区 › Mislav

如果有连续3个单词的regex

regex r

Mislav · 技术社区 · 6 年前

我正在寻找一个正则表达式,如果有连续提取3个单词。例如,如果我有两个字符串:

"1. Stack is great and awesome"
"2. Stack"

结果是:

"Stack is great"
"Stack"

这个答案不适合我: regex: matching 3 consecutive words

我的努力:

(?:[A-ZÅ ÄÄÅ½a-zÅ¡ÄÄÅ¾]+ )(?:[A-ZÅ ÄÄÅ½a-zÅ¡ÄÄÅ¾]+ )(?:[A-ZÅ ÄÄÅ½a-zÅ¡ÄÄÅ¾]+ )

1 回复 | 直到 6 年前

Wiktor StribiÅ¼ew 6 年前

> x <- c("1. Stack is great and awesome", "2. Stack")
> regmatches(x, regexpr("[A-Za-z]+(?:\\s+[A-Za-z]+){0,2}", x))
[1] "Stack is great" "Stack"
## Or to support all Unicode letters
> y <- c("1. StÄck is great and awesome", "2. Stack")
> regmatches(y, regexpr("\\p{L}+(?:\\s+\\p{L}+){0,2}", y, perl=TRUE))
[1] "StÄck is great" "Stack"
## In some R environments, it makes sense to use another, TRE, regex:
> regmatches(y, regexpr("[[:alpha:]]+(?:[[:space:]]+[[:alpha:]]+){0,2}", x))
[1] "StÄck is great" "Stack"

regex demo online R demo regex demo

{0,2} {1,2}

gregexpr regexpr

\\p{L}+ [A-Za-z]
(?:\\s+\\p{L}+){0,2} (?:\\s+[a-zA-Z]+){0,2}
- \\s+

perl=TRUE \p{L} (*UCP)

stringr::str_extract stringr::str_extract_all

> str_extract(x, "\\p{L}+(?:\\s+\\p{L}+){0,2}")
[1] "Stack is great" "Stack"         
> str_extract(x, "[a-zA-Z]+(?:\\s+[a-zA-Z]+){0,2}")
[1] "Stack is great" "Stack"         
> str_extract(x, "[[:alpha:]]+(?:\\s+[[:alpha:]]+){0,2}")
[1] "Stack is great" "Stack"

stringr

> str_extract(y, "\\p{L}+(?:\\s+\\p{L}+){0,2}")
[1] "StÄck iÃ§ great" "Stack"         
> str_extract(y, "[[:alpha:]]+(?:\\s+[[:alpha:]]+){0,2}")
[1] "StÄck iÃ§ great" "Stack"

推荐文章

Marc B. · 使用ggplot2创建条形图时“缺少值”

1 年前

deschen · tidyverse与外部向量发生突变,该外部向量的元素是数据帧中的列值

1 年前

Laura · 在Shiny中使用可排序的包拖放名称,这些名称将成为图表

1 年前

Mallikarjun M · 如何使用随机森林进行时间序列预测?

1 年前

ly li · 模型摘要:当表格形状改变时,拟合优度消失

1 年前

C.Robin · 将marginaffects::predictions()的结果连接回main df?

1 年前

monotonic · 如何将格式为“col1+col3+col4”的数据帧的行名转换为一列数字向量“c(1,3,4)”?

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

RoyBatty · 统计每个字符在整个数据集中出现的次数

2 年前

stats_noob · R: 记录某个“行为”发生的循环的索引?

2 年前