代码之家 › 专栏 › 技术社区 › Jeff Saremi

在HTML文档的元素上分散搜索文本

search html javascript

Jeff Saremi · 技术社区 · 7 年前

更新:对于将此问题标记为以下内容的副本的用户: 我正在搜索的文本可能只包含在一个元素中,也可能分布在100个元素上。在搜查之前我不知道。我只知道我搜索的模式中的单词来自这个HTML。现在我需要做一个搜索,跳过(但记住)HTML/javascript,它可能会与我要查找的文本交叉显示。

我希望这个解释有助于找到我问题的答案。

*****更新结束****

我正在寻找一个库或一段代码,允许在HTML文档中搜索和定位任意纯文本(开始/停止偏移量或标记)。

例子:

要查找的模式:“我要查找的文本”
HTML文档:

<html>...<p>text that <b>I'm</b/> <span>looking
   for<div>...</div>...</p>

结果匹配:

text that <b>I'm</b/> <span>looking for

有人知道这种效用吗? 谢谢

1 回复 | 直到 7 年前

dsharhon 7 年前

编辑:做了一些实际的编程。此算法接受字符和HTML标记之间的HTML标记以及单词之间的空白。

const haystack = '<html>This, <b>that</b>, and\nthe<i>other</i>.</html>';
const needle = 'This, that, and the other.';

// Make a regex from the needle...
let regex = '';

// ..split the needle into words...
const words = needle.split(/\s+/);
for (let i = 0; i < words.length; i++) {
  const word = words[i];

  // ...allow HTML tags after each character except the last one in a word...
  for (let i = 0; i < word.length - 1; i++) {
    regex += word.charAt(i) + '(<.+?>)*';
  }
  regex += word.charAt(word.length - 1);

  // ...allow a mixture of whitespace and HTML tags after each word except the last one
  if (i < words.length - 1) regex += '(\\s|(<.+?>))+';
}

// Find the match, if any
const matches = haystack.match(regex);
console.log(matches);

// Report results
if (matches) {
  const match = matches[0];
  const offset = matches.index;

  console.log('Found match!');
  console.log('Offset: ' + offset);
  console.log('Length: ' + match.length);
  console.log(match);
}

推荐文章

code-geek · Jquery根据单选按钮选择隐藏或显示文本字段

4 月前

Niobos · 如何/是否有方法使通用算法函数同时与同步和异步函数一起工作?

4 月前

Alex · 在轻量级中同时解构和不解构变量

4 月前

Ângelo Rigo · ReactJS映射:如何迭代[关闭]

4 月前

bairog · 从按属性筛选的对象数组字典中创建值数组

4 月前

KaiMcKiernan · 基于Math.random()的函数在另一个函数内不起作用[关闭]

4 月前

David Jeong · 浏览器不会为浏览器自动添加的某些标头发送CORS预检请求吗?

4 月前

user29519291 · 为不同的变量创建一个带有可重用Click函数的简单菜单

5 月前

user3472810 · Angular@Output/EventEmitter返回undefined

5 月前

lokiuucx · JS对象属性返回未定义,尽管对象属性应该有值

5 月前