代码之家 › 专栏 › 技术社区 › hoperose

如何使用python中的正则表达式从文件中提取特定段落?

paragraph extract regex python

hoperose · 技术社区 · 7 年前

我的问题是通过Python中的正则表达式从文件中提取某个段落(例如,通常是中间段落)。

示例文件如下:

poem = """The time will come
when, with elation,
you will greet yourself arriving
at your own door, in your own mirror,
and each will smile at the other's welcome,
and say, sit here. Eat.
You will love again the stranger who was your self.
Give wine. Give bread. Give back your heart
to itself, to the stranger who has loved you

all your life, whom you ignored
for another, who knows you by heart.
Take down the love letters from the bookshelf,

the photographs, the desperate notes,
peel your own image from the mirror.
Sit. Feast on your life."""

如何使用python中的正则表达式提取这首诗的第二段(意思是“一生……书架”)?

3 回复 | 直到 7 年前

Aaditya Ura 7 年前

使用组捕获并尝试以下操作:

import re


pattern=r'^(all.*bookshelf[,\s])'

second=re.search(pattern,poem,re.MULTILINE | re.DOTALL)
print(second.group(0))

Sweeper 7 年前

积极地向前看,向后看:

(?<=\n\n).+(?=\n\n)

这个 (?<=\n\n) \n\n 在它后面。

最后一位 (?=\n\n) 是一种前瞻性,只有在有前瞻性的情况下,才与之前的事物相匹配 \n\n

试试看: https://regex101.com/r/7XnDjS/1

Ken Schumack 7 年前

某些Windows文本文件的结尾应该是一行,而不是一行。 Python有关于正则表达式的优秀文档。只需谷歌“python regexp”。您甚至可以在谷歌上搜索“perl regexp”,因为Python从perl复制了regexp;-) 获取第二段文本的一种方法是使用()在两组两个或多个行尾之间获取文本,如下所示:

myPattern = re.compile('[^\r\n]+\r?\n\r?\n+([^\r\n]+)\r?\n\r?\n.*')

然后像这样使用:

secondPara = myPattern.sub("\\1", content)

以下是我的行动脚本:

schumack@linux2 137> ./poem2.py
secondPara: all your life, whom you ignored for another, who knows you by heart. Take down the love letters from the bookshelf,

推荐文章

Essi · R-基于匹配值从另一个数据帧添加数据[重复]

7 年前

wen tian · 使用beautifulsoup从网站中提取数字?

7 年前

user7579444 · 在Python中,如何获取相同字符的数量及其在字符串中的位置?

7 年前

Ty Kayn · PHP7中的ZipArchive找不到zip的内容

7 年前

YazOT · 使用python从文本文件中提取特定行

7 年前

plaidshirt · JMeter JSON提取器按条件获取值

7 年前

Pau · 从字符串中提取超链接的Php函数

7 年前

kroy2008 · 从选定尾注生成的字符串中提取文本

7 年前

Fabio Favoretto · 在R中匹配不同数据帧中的站点

7 年前

hoperose · 如何使用python中的正则表达式从文件中提取特定段落?

7 年前