代码之家 › 专栏 › 技术社区 › Storo

正则表达式,用于根据长度过滤单词或在Python中排除单词

telnet regex python

Storo · 技术社区 · 7 年前

我一直想弄明白这一点,但由于我是regex的neewby,我一直没能弄明白。我需要选择一些telnet输出的正确行,如下所示:

systemstatus get resume    # line to exclude
systemstatus get idle      # line to filter
systemstatus get talking   # line to filter
systemstatus get ringing   # line to filter
systemstatus get outgoing  # line to filter
systemstatus get sleeping  # line to filter

正如你所看到的,我需要排除带有简历的那个,然后选择所有其他的。所以我知道我可以按长度过滤,但我只知道如何按比某个东西大的长度过滤,而不是按许多长度过滤。例如: "systemstatus get \w{7,}" 将排除 resume 而且 idle 线所以实际上我需要一些过滤长度为4、7和8的东西。

有人知道怎么做吗?

注意:这必须在regex中完成,因为telnet库。

注2:由于是telnet,当 systemstatus get resume 出现(这就是我所说的“排除”)并不像我在 systemstatus get idle 进来了。因此,通过“systemstatus获取任何信息”进行过滤,然后排除“resume”将在收到“resume”时停止读取。我正在使用 telnet.expect([], timeout) telnet库的。

3 回复 | 直到 7 年前

cs95 abhishek58g 7 年前

选项1
呼叫 re.findall 使用 re.MULTILINE 转换

matches = re.findall(r"systemstatus get \b(?:\w{4}|\w{7,8})\b", t, re.M)

它以字符串列表的形式返回每个匹配项。

正则表达式详细信息

systemstatus get    # literals
\b                  # word boundary
(?:                 # non-capturing group
\w{4}               # find a word of size 4 
|                   # regex OR pipe
\w{7,8}             # find a word of size 7 or 8
)
\b

由于您的要求,我们在这里按字数匹配-

我需要过滤长度为4、7和8的东西。

选项2
将多行字符串拆分为单独的行,遍历每行并调用 re.match 在每一个上-

matches = []

for line in t.splitlines():
    if re.match(r"systemstatus get \b(?:\w{4}|\w{7,8})\b", line):
        matches.append(line)

heemayl 7 年前

具有零宽度负前瞻性( (?!resume(?:\s|$)) )确保 resume 不在后面 systemstatus get :

^systemstatus get (?!resume(?:\s|$)).*$

Demo

Jan 7 年前

虽然正则表达式功能强大,但这里并不真正需要它们,只需拆分、应用和组合即可:

text = """
systemstatus get resume    # line to exclude
systemstatus get idle      # line to filter
systemstatus get talking   # line to filter
systemstatus get ringing   # line to filter
systemstatus get outgoing  # line to filter
systemstatus get sleeping  # line to filter
"""

lines = "\n".join([line for line in text.split("\n") 
                  if line and not "resume" in line])
print(lines)

这将产生

systemstatus get idle      # line to filter
systemstatus get talking   # line to filter
systemstatus get ringing   # line to filter
systemstatus get outgoing  # line to filter
systemstatus get sleeping  # line to filter

除非,您没有像 systemstatusresumesystem get idle (含义 resume 没有任何单词边界),就不需要正则表达式引擎的开销。

时间安排 不同的方法(每个10万美元)产生

print(timeit.timeit(noregex, number=10**5))
# 0.28622116599945 s

print(timeit.timeit(regex, number=10**5))
# 0.5753898609982571 s

因此,非正则表达式解决方案只需要 一半的时间 .