代码之家  ›  专栏  ›  技术社区  ›  Sangram Nandkhile Viktor Klang

从xhtml文档中删除未关闭的打开标签

  •  6
  • Sangram Nandkhile Viktor Klang  · 技术社区  · 14 年前

    我有一个很大的xhtml文档,里面有很多标签。我注意到一些未关闭的开头段落标记不必要地重复,我想删除它们或用空格替换它们。 我只想代码,以确定未关闭的段落标签,并删除它们。

    下面是一个小例子来说明我的意思:

    <p><strong>Company Registration No.1</strong> </p>
    <p><strong>Company Registration No.2</strong></p>
    
    <p>      <!-- extra tag -->
    <p>      <!-- extra tag -->
    
    <hr/>     
    
    <p><strong> HALL WOOD (LEEDS) LIMITED</strong><br/></p>
    <p><strong>REPORT AND FINANCIAL STATEMENTS </strong></p>
    

    2 回复  |  直到 14 年前
        1
  •  3
  •   Richard J. Ross III    14 年前

    这应该起作用:

    public static class XHTMLCleanerUpperThingy
    {
        private const string p = "<p>";
        private const string closingp = "</p>";
    
        public static string CleanUpXHTML(string xhtml)
        {
            StringBuilder builder = new StringBuilder(xhtml);
            for (int idx = 0; idx < xhtml.Length; idx++)
            {
                int current;
                if ((current = xhtml.IndexOf(p, idx)) != -1)
                {
                    int idxofnext = xhtml.IndexOf(p, current + p.Length);
                    int idxofclose = xhtml.IndexOf(closingp, current);
    
                    // if there is a next <p> tag
                    if (idxofnext > 0)
                    {
                        // if the next closing tag is farther than the next <p> tag
                        if (idxofnext < idxofclose)
                        {
                            for (int j = 0; j < p.Length; j++)
                            {
                                builder[current + j] = ' ';
                            }
                        }
                    }
                    // if there is not a final closing tag
                    else if (idxofclose < 0)
                    {
                        for (int j = 0; j < p.Length; j++)
                        {
                            builder[current + j] = ' ';
                        }
                    }
                }
            }
    
            return builder.ToString();
        }
    }
    

        2
  •  2
  •   M.L.    14 年前

    你必须找出,什么样的DOM树被创建了。它可以解释为

    <p><strong>Company Registration No.1</strong> </p>
    <p><strong>Company Registration No.2</strong></p>
    
    <p>      <!-- extra tag -->
      <p>      <!-- extra tag -->
        <hr/>     
        <p><strong> HALL WOOD (LEEDS) LIMITED</strong><br/></p>
        <p><strong>REPORT AND FINANCIAL STATEMENTS </strong></p>
      </p>
    </p>
    

    <p><strong>Company Registration No.1</strong> </p>
    <p><strong>Company Registration No.2</strong></p>
    
    <p></p>      <!-- extra tag -->
    <p></p>      <!-- extra tag -->
    <hr/>     
    <p><strong> HALL WOOD (LEEDS) LIMITED</strong><br/></p>
    <p><strong>REPORT AND FINANCIAL STATEMENTS </strong></p>