代码之家 › 专栏 › 技术社区 › Mikhail

删除最后匹配模式之间的行

sed awk perl bash

Mikhail · 技术社区 · 6 年前

首先,我知道 these nice 问题。我的问题有点不同:鉴于下面的文本格式来自 file1 :

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep
Pattern 1
REMOVE ME
AND ME
ME TOO PLEASE
Pattern 2

如何仅删除上一个 Pattern 1 和 Pattern 2 包括模式,以便 文件1 现在包含:

Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

我更喜欢使用sed的解决方案,但任何其他解决方案(perl、bash、awk)都可以。

4 回复 | 直到 6 年前

ghoti 6 年前

我想不出一种简单而优雅地在sed中实现这一点的方法。使用sed可以做到这一点 write-only code ,但我需要一个很好的理由来写这样的东西。:-)

您仍然可以使用 sed 为此,请结合其他工具:

$ tac test.txt | sed '/^Pattern 2$/,/^Pattern 1$/d' | tac
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

如果您的系统没有 tac 在其上,您可以创建一个:

$ alias tac="awk '{L[i++]=\$0} END {for(j=i-1;j>=0;)print L[j--]}'"

或与主题一致:

$ alias tac='sed '\''1!G;h;$!d'\'

也就是说,我会在awk中这样做:

$ awk '/Pattern 1/{printf "%s",b;b=""} {b=b $0 ORS} /Pattern 2/{b=""} END{printf "%s",b}' text.txt
Pattern 1
some text to keep
nice text here
Pattern 1
another text to keep

或分拆以便于阅读/评论:

awk '
  /Pattern 1/ {          # If we find the start pattern,
    printf "%s",b        # print the buffer (or nothing if it's empty)
    b=""                 # and empty the buffer.
  }
  {                      # Add the current line to a buffer, with the
    b=b $0 ORS           # correct output record separator.
  }
  /Pattern 2/ {          # If we find our close pattern,
    b=""                 # just empty the buffer.
  }
  END {                  # And at the end of the file,
    printf "%s",b        # print the buffer if we have one.
  }' test.txt

这与hek2mgl的解决方案大致相同,但排序更加合理,并使用ORS.:-)

请注意,只有在以下情况下,这两种解决方案才能正常运行 Pattern 2 文件中只存在一次。如果您有多个块,即包括开始和结束模式,那么您需要更加努力地工作。如果是这种情况,请在您的问题中提供更多详细信息。

choroba 6 年前

perl -ne 'if    (/Pattern 1/) { print splice @buff; push @buff, $_ }
          elsif (/Pattern 2/) { @buff = () }
          elsif (@buff)       { push @buff, $_ }
          else                { print }
' -- file

当你看到 Pattern 1 ,开始将线推入 @buff 呃,输出到目前为止累计的所有行。当你看到 Pattern 2 ,清除缓冲区。如果缓冲区已启动,请将任何其他行推送到缓冲区,否则打印它(第一行之前的文本 模式1 或之后 模式2 。

注:行为 模式2 无上一个 模式1 未指定。

hek2mgl 6 年前

带awk:

awk '
# On pattern 1 and when the buffer is not empty, flush the buffer
/Pattern 1/ && b!="" { printf "%s", b; b="" }

# Append the current line and a newline to the buffer
{ b=b""$0"\n" }

# Clean the buffer on pattern 2
/Pattern 2/ { b="" }' file

potong 6 年前

这可能适合您(GNU-sed):

sed '/Pattern 1/,${//{x;//p;x;h};//!H;$!d;x;s/.*Pattern 2[^\n]*\n\?//;/^$/d}' file

这里的总体思路是从 Pattern 1 然后当另一行以 模式1 遇到或在文件末尾删除 模式1 和 Pattern 2 并打印剩余的内容。

关注包含以下内容的第一行之间的行 模式1 在文件末尾,按常规打印所有其他行。如果一行包含 模式1 ,交换到保留空间,如果这些行也包含相同的regexp,则打印这些行,然后替换保留空间中的当前行。如果当前行不包含regexp,则将其附加到保留空间,如果它不是文件的结尾,则将其删除。在文件末尾,交换到保留空间并删除所有包含以下内容的行 模式2 并打印剩余内容。

N、 B.如您的示例中所示,当包含 模式2 是文件的最后一行。由于sed使用换行符来分隔行,因此它会在将行放入图案空间之前删除这些行,并在打印之前附加这些行。如果模式/保持空间为空,sed将附加一个换行符,在这种情况下,这将添加一个伪换行符。解决方案是删除 模式1 和 模式2 包括包含以下内容的行之后的任何换行符 模式2 。如果有额外的行,这些行将正常打印,但是如果后面没有行,保留空间现在将是空的,因为它必须包含以前的内容,因为它现在是空的,所以可以安全地删除。