新答案
如果您真正的html文档是有效的(并且有一个父/包含标记),那么最合适和可靠的技术将是使用适当的DOM解析器。
下面是如何使用DOMDocument和Xpath优雅地定位和替换指定的标记属性:
Demo
)
$domain = '//example.com';
$tagsAndAttributes = [
'img' => 'src',
'form' => 'action',
'a' => 'href'
];
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($tagsAndAttributes as $tag => $attr) {
foreach ($xpath->query("//{$tag}[not(starts-with(@{$attr}, '//'))]") as $node) {
$node->setAttribute($attr, $domain . $node->getAttribute($attr));
}
}
echo $dom->saveHTML();
代码2-带条件块的单个Xpath查询:(
Demo
)
$domain = '//example.com';
$targets = [
"//img[not(starts-with(@src, '//'))]",
"//form[not(starts-with(@action, '//'))]",
"//a[not(starts-with(@href, '//'))]"
];
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query(implode('|', $targets)) as $node) {
if ($src = $node->getAttribute('src')) {
$node->setAttribute('src', $domain . $src);
} elseif ($action = $node->getAttribute('action')) {
$node->setAttribute('action', $domain . $action);
} else {
$node->setAttribute('href', $domain . $node->getAttribute('href'));
}
}
echo $dom->saveHTML();
旧答案:(…regex没有“DOM感知”,容易受到意外破坏)
如果我正确地理解了您,那么您心里有一个基本值,您只想将其应用于相对路径。
Pattern Demo
代码:(
Demo
)
$html=<<<HTML
<img src="/relative/url/img.jpg" />
<form action="/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
HTML;
$base='https://example.com';
echo preg_replace('~(?:src|action|href)=[\'"]\K/(?!/)[^\'"]*~',"$base$0",$html);
输出:
<img src="https://example.com/relative/url/img.jpg" />
<form action="https://example.com/">
<a href='https://example.com/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
模式分解:
~ #Pattern delimiter
(?:src|action|href) #Match: src or action or href
= #Match equal sign
[\'"] #Match single or double quote
\K #Restart fullstring match (discard previously matched characters
/ #Match slash
(?!/) #Negative lookahead (zero-length assertion): must not be a slash immediately after first matched slash
[^\'"]* #Match zero or more non-single/double quote characters
~ #Pattern delimiter