代码之家 › 专栏 › 技术社区 › akbiggs

如何检查另一个字符串中是否存在确切的字符串?

pattern-matching string python

akbiggs · 技术社区 · 14 年前

我现在遇到了点麻烦。我正在尝试编写一个程序,该程序将突出显示一个单词或短语在另一个字符串中的出现,但前提是要匹配的字符串完全相同。我遇到麻烦的部分是确定我匹配的短语是否包含在另一个较大的子短语中。

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> indicators_in_phrase = [indicator for indicator in indicators 
                            if indicator in phrase.lower()]
>>> print indicators_in_phrase
['therefore', 'for']

我不想让“for”包含在那个列表中。我知道为什么要包含它,但我想不出任何表达式可以过滤掉这样的子字符串。

我在网站上注意到了其他类似的问题,但是每个问题都涉及一个Regex解决方案,这是我现在还不习惯的,尤其是在Python中。不使用正则表达式有什么简单的方法可以解决这个问题吗?如果没有,我们将非常感谢上面的示例中相应的Regex表达式以及如何实现它。

8 回复 | 直到 14 年前

Paulo Scardine 14 年前

这是一行regex。。。

import re

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."

indicators_in_phrase = set(re.findall(r'\b(%s)\b' % '|'.join(indicators), phrase.lower()))

Ignacio Vazquez-Abrams 14 年前

在那里是

rubik 14 年前

正则表达式是最简单的方法!

re.compile(r'\btherefore\b')

然后你可以在中间换个词!

编辑:我为你写的:

import re

indicators = ["therefore", "for", "since"]

phrase = "... therefore, I conclude I am awesome. "

def find(phrase, indicators):
    def _match(i):
        return re.compile(r'\b%s\b' % (i)).search(phrase)
    return [ind for ind in indicators if _match(ind)]

>>> find(phrase, indicators)
['therefore']

jgritty 14 年前

import string

words_in_phrase = string.split(phrase)

现在您将看到这样一个列表中的单词:

['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']

indicators_in_phrase = []
for word in words_in_phrase:
  if word in indicators:
    indicators_in_phrase.append(word)

可能有几种方法可以减少冗长,但我更喜欢清晰。另外,你可能还得考虑去掉标点符号,比如“太棒了”,“因此,”

theReverseFlick 14 年前

创建一组指示器
找到交叉点

代码:

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."
print list(set(indicators).intersection(set( [ each.strip('.,') for each in phrase.split(' ')])))

pyfunc 14 年前

有点冗长,但给出了一个想法/当然,regex可以使它变得简单

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> phrase_list = phrase.split()
>>> phrase_list
['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']
>>> phrase_list = [ k.rstrip(',') for k in phrase_list]
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in phrase_list]
>>> indicators_in_phrase 
['therefore']

Francis Potter 14 年前

“for”的问题是它在“so”里面,还是它不是一个词?例如,如果您的一个指标是“awe”,您希望它包含在indicators_in_短语中吗?

指标=[“abc”,“cde”] 短语“One abcde two”

ghostdog74 14 年前

你可以从你的短语中去掉标点符号,然后对它进行拆分,这样所有的单词都是独立的。然后你可以做字符串比较

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
['therefore', 'I', 'conclude', 'I', 'am', 'awesome']
>>> p = ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in p ]
>>> indicators_in_phrase
['therefore']