代码之家  ›  专栏  ›  技术社区  ›  Evan Fosmark

正则表达式替换(在Python中)-一种更简单的方法?

  •  43
  • Evan Fosmark  · 技术社区  · 15 年前

    每当我想替换一段较大文本的一部分时,我总是必须执行以下操作:

    "(?P<start>some_pattern)(?P<replace>foo)(?P<end>end)"
    

    然后连接 start replace 然后是 end

    有更好的方法吗?

    4 回复  |  直到 11 年前
        1
  •  105
  •   Roger Pate Roger Pate    15 年前
    >>> import re
    >>> s = "start foo end"
    >>> s = re.sub("foo", "replaced", s)
    >>> s
    'start replaced end'
    >>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
    >>> s
    'start can use a callable for the replaced text too end'
    >>> help(re.sub)
    Help on function sub in module re:
    
    sub(pattern, repl, string, count=0)
        Return the string obtained by replacing the leftmost
        non-overlapping occurrences of the pattern in string by the
        replacement repl.  repl can be either a string or a callable;
        if a callable, it's passed the match object and must return
        a replacement string to be used.
    
        2
  •  18
  •   zenazn    15 年前

    re documentation 看样子 (?=...) 向后看 (?<=...)

        3
  •  11
  •   Ben Blank    15 年前

    使用Python的 re 单元无法改变这一点:

    >>> import re
    >>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
    'fooquuxbaz'
    >>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")
    
    Traceback (most recent call last):
      File "<pyshell#2>", line 1, in <module>
        re.sub("(?<=fo+)bar(?=baz)", "quux", string)
      File "C:\Development\Python25\lib\re.py", line 150, in sub
        return _compile(pattern, 0).sub(repl, string, count)
      File "C:\Development\Python25\lib\re.py", line 241, in _compile
        raise error, v # invalid expression
    error: look-behind requires fixed-width pattern
    

    这意味着您需要解决这个问题,最简单的解决方案与您现在所做的非常相似:

    >>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
    'fooquuxbaz'
    >>>
    >>> # If you need to turn this into a callable function:
    >>> def replace(start, replace, end, replacement, search):
            return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)
    

    an expert has to say on the matter (他说的是JavaScript,它完全没有lookbehind,但许多原则都是一样的),你会看到他最简单的解决方案看起来很像这个。

        4
  •  4
  •   aplavin    12 年前

    我认为最好的方法就是在一个组中捕获您想要替换的任何内容,然后使用捕获的组的开始和结束属性来替换它。

    当做

    阿德林

    #the pattern will contain the expression we want to replace as the first group
    pat = "word1\s(.*)\sword2"   
    test = "word1 will never be a word2"
    repl = "replace"
    
    import re
    m = re.search(pat,test)
    
    if m and m.groups() > 0:
        line = test[:m.start(1)] + repl + test[m.end(1):]
        print line
    else:
        print "the pattern didn't capture any text"
    

    “word1永远不会是word2”