你可以使用
stringr::str_extract_all
要从每行获取匹配项列表,请执行以下操作:
library(stringr)
library(tidyverse)
Job_Hist$matches <- str_extract_all(Job_Hist$Work.Experience,
paste(Term_List, collapse = '|'), simplify = TRUE)
Work.Experience Term matches.1 matches.2
1 cooked food; cleaned house; made beds FALSE
2 analyzed data; identified gaps; used sql, python, and r TRUE sql python
3 used tableau to make dashboards for clients; applied advanced macro excel functions TRUE tableau excel
4 financial planning and strategy; consulted with leaders and clients FALSE
matches.3
1
2 r
3
4
编辑:
如果希望将匹配项作为逗号分隔的字符串放在一列中,可以使用:
str_extract_all(Job_Hist$Work.Experience, paste(Term_List, collapse = '|')) %>%
sapply(., paste, collapse = ", ")
matches
1
2 sql, python, r
3 tableau, excel
4
注意,如果使用默认参数
simplify = FALSE
在里面
str_extract_all
你的专栏
matches
看起来是正确的,就像我们得到的结果一样
sapply
上面。但是,如果你用
str()
您会看到每个元素实际上都是它自己的列表,这会给某些类型的分析带来问题。