为了澄清标题,我试图在HTML文档中找到一行,该行由以下字符串(或一些相近的变体)包围:
<!--Copy from here-->
并且只包含某个唯一文件名中的一个。我们称之为
SN12345.htm
我正在搜索的文档中有几个行,我很难在“复制”字符串之间仅隔离一行。以下regex几乎匹配整个文档(启用单行标志):
\<!-+Copy [A-Za-z ]+-+\>(.*SN12345\.htm.*)\<!-+Copy [A-Za-z ]+-+\>
我希望它只匹配文件名后面第一次出现的复制字符串。我该怎么做?我在用蟒蛇。
下面是一个示例输入:
<!--Copy from here-->
<TR valign=top>
<TD><A href="SN_2100-24.htm">2100-24</A></TD>
<TD>2100 Bioanalyzer - peak find problem when using new Ambion RNA ladder Cat. #7152</TD>
<TD>11/04</TD><td valign=top><p align="center">I</p></td></tr>
<!--Copy from here-->
<TR valign=top>
<TD><A href="SN_2100-23.htm">2100-23</A></TD>
<TD>2100 Bioanalyzer communication problems when both Biosizing and 2100 Expert SW are active</TD>
<TD>10/04</TD><td valign=top><p align="center">I</p></td></tr>
<!--Copy from here-->
<TR valign=top>
<TD><A href="SN_2100-22.htm">2100-22</A></TD>
<TD>Incompatibility of 2100 Expert and Microsoft Windows XP Service Pack 2</TD>
<TD>09/04</TD><td valign=top><p align="center">I</p></td></tr>
<!--Copy from here-->
<TR valign=top>
<TD><A href="SN_2100-21.htm">2100-21</A></TD>
<TD>2100 Bioanalyzer - DNA LabChip Kits and detergent containing PCR buffer</TD>
<TD>04/04</TD><td valign=top><p align="center">I</p></td></tr>
<!--Copy from here-->
<TR valign=top>
<TD><A href="SN_2100-20.htm">2100-20</A></TD>
<TD>General PC system and settings requirements for 2100 expert software</TD>
<TD>04/04</TD><td valign=top><p align="center">I</p></td></tr>
<!--Copy from here-->
<TR valign=top>
<TD>2100-19</A></TD>
<TD>not used</TD>
<TD>01/04</TD><td valign=top><p align="center">I</p></td></tr>
<!--Copy from here-->
<TR valign=top>
<TD><A href="SN_2100-18.htm">2100-18</A></TD>
<TD>RNA 6000 Pico Kits - ART® Aerosol Resistant Tips generate baseline abnormalities</TD>
<TD >01/04</TD><td valign=top><p align="center">I</p></td></tr>
<!--Copy from here-->