代码之家 › 专栏 › 技术社区 › Sociopath

如果字符串中包含停止字,则从该字符串中移除元素[重复]

nltk python-3.x python

Sociopath · 技术社区 · 6 年前

这个问题已经有了答案:

How to remove items from a list that contains words found in items in another list [duplicate] 4答

我有如下列表:

lst = ['for Sam', 'Just in', 'Mark Rich']

我正在尝试从包含 stopwords .

因为列表中的第一个和第二个元素包含 for 和 in 哪些是 停用词 ,它会回来的

new_lst = ['Mark Rich']

我试过什么

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split(" ") for i in lst]
new_lst = [" ".join(i) for i in new_lst for j in i if j not in stop_words]

其输出为:

['for Sam', 'Just in', 'Mark Rich', 'Mark Rich']

2 回复 | 直到 6 年前

jpp 6 年前

你需要一个 if 语句而不是额外嵌套:

new_lst = [' '.join(i) for i in new_lst if not any(j in i for j in stop_words)]

如果你想利用 set ,你可以使用 set.isdisjoint :

new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

下面是一个演示:

stop_words = {'for', 'in'}

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split() for i in lst]
new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

print(new_lst)

# ['Mark Rich']

yatu Sayali Sonawane 6 年前

你可以用列表理解和使用 sets 要检查两个列表中是否有任何单词相交:

[i for i in lst if not set(stop_words) & set(i.split(' '))]
['Mark Rich']]

推荐文章

Aaron Green · 我的python程序无法识别数据库的存在,即使它在那里

1 年前

danial · 如何在多个字符串的每个位置找到最频繁的字符

2 年前

Henry · 使用Python将json重新格式化为键值对

2 年前

eymentakak · json字典类型错误:字符串索引必须是整数

2 年前

Qubix · 从熊猫数据帧创建相对熵矩阵

2 年前

FÄÅ ÛÅ · 字典、列表和字符串

2 年前

OrbitDuster · 如何使用gmail api在python中打印gmail正文?

2 年前

guiguilecodeur · 如何删除我的词汇表中的重复元素

2 年前

Susheel P M · 这是关于if-else语句[关闭]

2 年前

Slartibartfast · 关于Python版本安装

2 年前