代码之家  ›  专栏  ›  技术社区  ›  Prabhu Murthi

解析Java脚本中HTML标记的最佳方法

  •  0
  • Prabhu Murthi  · 技术社区  · 14 年前

    是否有任何人可以帮助/建议来解析HTML标记出现在 <body>...</body> 标签

    3 回复  |  直到 14 年前
        1
  •  2
  •   John    14 年前

    我想您应该想使用PHP解析一个HTML文档。我建议您阅读关于 http://www.php.net/manual/en/book.dom.php

    以下是 php-pro提供的示例

    <?PHP
    
    $HTML=
    &!doctype html public“-//w3c//dtd xhtml 1.0严格//en”
    “http://www.w3.org/tr/xhtml1/dtd/xhtml1 strict.dtd”>
    <html xmlns=“http://www.w3.org/1999/xhtml”xml:lang=“en-us”dir=“ltr”>
    &头;
    <title>phpro.org</title>
    和/头& GT;
    和身体;
    <h2>周六预测</h2>
    &!--2008年5月23日星期五8时28分发布-->
    <table border=“0”summary=“Capital Cities Precis预测”>
    TSBOR>
    T>
    <td><a href=“/products/idn10064.shtml”title=“链接到悉尼预测”>悉尼</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>19&deg;</td>
    <td>好。多半阳光明媚。</td>
    & LT/TR & GT;
    
    T>
    <td><a href=“/products/idv10450.shtml”title=“链接到墨尔本预测”>墨尔本</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>16&deg;</td>
    <td>雾然后很好。</td>
    & LT/TR & GT;
    
    T>
    <td><a href=“/products/idq10095.shtml”title=“链接到布里斯班预测”>布里斯班</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>24&deg;</td>
    <td>大部分都很好</td>
    & LT/TR & GT;
    
    T>
    <td><a href=“/products/idw12300.shtml”title=“链接到珀斯预测”>珀斯</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>21&deg;</td>
    <td>淋浴次数少,稍后增加。</td>
    & LT/TR & GT;
    
    T>
    <td><a href=“/products/ids10034.shtml”title=“链接到阿德莱德预测”>阿德莱德</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>20&deg;</td>
    <td>好。多半阳光明媚。</td>
    & LT/TR & GT;
    
    T>
    <td><a href=“/products/idt65061.shtml”title=“链接到霍巴特预测”>霍巴特</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>13&deg;</td>
    <td>主要是好的。</td>
    & LT/TR & GT;
    
    T>
    <td><a href=“/products/idn10035.shtml”title=“链接到堪培拉预测”>堪培拉</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>15&deg;</td>
    <td>很好,大部分是晴天。</td>
    & LT/TR & GT;
    
    T>
    <td><a href=“/products/idd10150.shtml”title=“链接到达尔文预测”>达尔文</a></td>
    <td title=“最高温度(摄氏度)”class=“Max AlignRight”>32&deg;</td>
    <td>阳光明媚。</td>
    & LT/TR & GT;
    
    &T/BOT>
    &表;
    
    和/身体;
    & lt//html & gt;
    ;
    
    /***新的DOM对象***/
    $dom=新建domdocument;
    
    /***将HTML加载到对象中***/
    $dom->加载HTML($html);
    
    /***放弃空白***/
    $dom->preserveWhitespace=false;
    
    /***表的标签名***/
    $tables=$dom->getElementsByTagname('table');
    
    /***从表中获取所有行***/
    $rows=$tables->项(0)->getElementsByTagname('tr');
    
    /***循环表行***/
    foreach($rows作为$row)
    {
    /***按标记名获取每列***/
    $cols=$row->getElementsByTagname(“td”);
    /***返回值***/
    echo$cols->项(0)->节点值。'<br/>';
    echo$cols->项(1)->节点值。'<br/>';
    echo$cols->项(2)->节点值;
    echo'<hr/>';
    }
    ?gt;
    < /代码> 
    出http://www.php.net/manual/en/book.dom.php

    以下是由提供的示例PHP Pro

    <?php
    
    $html = '
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" dir="ltr">
    <head>
    <title>PHPRO.ORG</title>
    </head>
    <body>
    <h2>Forecast for Saturday</h2>
    <!-- Issued at 0828 UTC Friday 23 May 2008 -->
    <table border="0" summary="Capital Cities Precis Forecast">
       <tbody>
          <tr>
             <td><a href="/products/IDN10064.shtml" title="Link to Sydney forecast">Sydney</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">19&deg;</td>
             <td>Fine. Mostly sunny.</td>
          </tr>
    
          <tr>
             <td><a href="/products/IDV10450.shtml" title="Link to Melbourne forecast">Melbourne</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">16&deg;</td>
             <td>Fog then fine.</td>
          </tr>
    
          <tr>
             <td><a href="/products/IDQ10095.shtml" title="Link to Brisbane forecast">Brisbane</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">24&deg;</td>
             <td>Mostly fine</td>
          </tr>
    
          <tr>
             <td><a href="/products/IDW12300.shtml" title="Link to Perth forecast">Perth</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">21&deg;</td>
             <td>Few showers, increasing later.</td>
          </tr>
    
          <tr>
             <td><a href="/products/IDS10034.shtml" title="Link to Adelaide forecast">Adelaide</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">20&deg;</td>
             <td>Fine. Mostly sunny.</td>
          </tr>
    
          <tr>
             <td><a href="/products/IDT65061.shtml" title="Link to Hobart forecast">Hobart</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">13&deg;</td>
             <td>Mainly fine.</td>
          </tr>
    
          <tr>
             <td><a href="/products/IDN10035.shtml" title="Link to Canberra forecast">Canberra</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">15&deg;</td>
             <td>Fine, mostly sunny.</td>
          </tr>
    
          <tr>
             <td><a href="/products/IDD10150.shtml" title="Link to Darwin forecast">Darwin</a></td>
             <td title="Maximum temperature in degrees Celsius" class="max alignright">32&deg;</td>
             <td>Fine and sunny.</td>
          </tr>
    
       </tbody>
    </table>
    
    </body>
    </html>
    ';
    
        /*** a new dom object ***/
        $dom = new domDocument;
    
        /*** load the html into the object ***/
        $dom->loadHTML($html);
    
        /*** discard white space ***/
        $dom->preserveWhiteSpace = false;
    
        /*** the table by its tag name ***/
        $tables = $dom->getElementsByTagName('table');
    
        /*** get all rows from the table ***/
        $rows = $tables->item(0)->getElementsByTagName('tr');
    
        /*** loop over the table rows ***/
        foreach ($rows as $row)
        {
            /*** get each column by tag name ***/
            $cols = $row->getElementsByTagName('td');
            /*** echo the values ***/
            echo $cols->item(0)->nodeValue.'<br />';
            echo $cols->item(1)->nodeValue.'<br />';
            echo $cols->item(2)->nodeValue;
            echo '<hr />';
        }
    ?> 
    
        2
  •  2
  •   vsync Qantas 94 Heavy    14 年前

    你的意思是像 John Resig's html parser ?

        3
  •  0
  •   James Westgate    14 年前

    您可以通过Ajax从另一个页面加载整个HTML文档,并使用jquery选择器(如果它是XHTML)对其进行分析。不确定这是否会起作用。