代码之家  ›  专栏  ›  技术社区  ›  user1631306

在python中查找匹配单词的上游5个单词

  •  0
  • user1631306  · 技术社区  · 6 年前

    例子。我有绳子

    我想搜索“老鼠”,然后得到找到的“老鼠字”上游的4个字

    我试过用

    re.search(r'\brat\b', " This is the most Absurd rat in the history")
    

    但是它给了我空间位置,比如span(25,28),但是我怎么用它来得到单词呢。如果我想知道单词的位置,那么我可以简单地得到4个索引词。

    3 回复  |  直到 6 年前
        1
  •  1
  •   Ajax1234    6 年前

    你可以用 re.findall

    s = "This is the most Absurd rat ever in the history"
    print(re.findall('^[\w\W]+(?=\srat)', s)[0].split()[-4:])
    

    ['is', 'the', 'most', 'Absurd']
    

    编辑2:

    如果你在寻找这四个词来追踪 "rat" ,您可以使用 itertools.groupby :

    import itertools
    s = "Some words go here rat This is the most Absurd rat final case rat"
    new_data = [[a, list(b)] for a, b in itertools.groupby(s.split(), key=lambda x:x.lower() == 'rat')]
    if any(a for a, _ in new_data): #to ensure that "rat" does exist in the string
      results = [new_data[i][-1][-4:] for i in range(len(new_data)-1) if new_data[i+1][0]]
      print(results)
    

    [['Some', 'words', 'go', 'here'], ['is', 'the', 'most', 'Absurd'], ['final', 'case']]
    
        2
  •  2
  •   Eric Duminil    6 年前

    (?:\S+\s){4}(?=rat\b) 可能接近你想要的:

    >>> sentence = "This is the most Absurd rat in the history"
    >>> import re
    >>> re.findall(r'(?:\S+\s){4}(?=rat\b)', sentence, re.I)
    ['is the most Absurd ']
    >>> re.findall(r'(?:\S+\s){4}(?=rat\b)', "I like Bratwurst", re.I)
    []
    >>> re.findall(r'(?:\S+\s){4}(?=rat\b)', "A B C D rat D E F G H rat", re.I)
    ['A B C D ', 'E F G H ']
    

    example .

        3
  •  1
  •   mVChr    6 年前

    rat , findall

    import re
    s = 'This is the most absurd rat ever in the history of rat kind I tell you this rat is ridiculous.'
    answer = [sub.split() for sub in re.findall(r'((?:\S+\s*){4})rat', s)]
    # [['is', 'the', 'most', 'absurd'],
    #  ['in', 'the', 'history', 'of'],
    #  ['I', 'tell', 'you', 'this']]
    

    上一个答案:

    你可以 split :

    import re
    s = 'This is the most Absurd rat ever in the history'
    answer = re.split(r'\brat\b', s, 1)[0].split()[-4:]
    # => ['is', 'the', 'most', 'Absurd']
    

    [0] [1] [-4:] [:4] . 您还需要添加一些代码来检查 老鼠