代码之家  ›  专栏  ›  技术社区  ›  steplab

parse tei domxpath get text子标记在evaluate循环内

  •  0
  • steplab  · 技术社区  · 7 年前

    <head> )内部电流分区。

    tei文件示例:

        <div type="lib" n="1"><head>LIBER I</head>...
    <div type="pr">...</div>
    <div type="cap" n="1"><head>CAP EX</head><p><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div>
    <div type="cap" n="2"><head>CAP EX</head><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div>
    </div>
    

    我试过了,但没有成功:

     //source file:
      $fulltext = '<div type="lib" n="1"><head>LIBER I</head>...<div type="pr">...</div><div type="cap" n="1"><head>CAP EX</head><p><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div><div type="cap" n="2"><head>CAP EX</head><milestone unit="par" n="1" />...<milestone unit="par" n="2" />...</div></div>';
        $dom = new DOMDocument();
        @$dom->loadHTML($fulltext);
        $domx = new DOMXPath($dom);
        $entries = $domx->evaluate("//div");
        echo '<ul>';
        foreach ($entries as $entry){
        $title = '';
        type = $entry->getAttribute( 'type' );
        $n = $entry->getAttribute( 'n' );
        $head = $domx->evaluate("string(./head[1])",$entry);
        if( $head != '' ) $title = $head; else $title = $n;
        echo '<li><a href="#'.$type.'-'.$n.'">'.$title.'</li>';
        }
        echo '</ul>';
    

    线路不工作:

    $head = $domx->evaluate("string(./head[1])",$entry);
    

     DOMDocument::loadHTML(): htmlParseStartTag: misplaced <head> tag in Entity, line: 3
    

    此行的目的是获取循环内子标记头的文本(在本例中为“LIBER I”)

    2 回复  |  直到 7 年前
        1
  •  0
  •   Nigel Ren    7 年前

    使用 @

    但是,如果您将行更改为

    $dom->loadXML($fulltext);
    

    输出为您提供了所需的内容。

        2
  •  0
  •   steplab    7 年前

    使用XMLReader解决:

        $level = 0;
                    $indici_bc = array();
                    $indici_head = array();
                    $passed_milestone = false;
                    $xml = new XMLReader(); 
                    $xml->open($pathTei);
                    //$xml->xml($testo);
                    while ($xml->read()){
                        if($xml->nodeType == XMLReader::END_ELEMENT && $xml->name == 'div'){
                            $level--;
                            $last_blocco = $xml->name;
                            if($passed_milestone){ $level--; $passed_milestone = false; }
                        }
                        if($xml->nodeType == XMLReader::ELEMENT && ($xml->name == 'div' || $xml->name == 'milestone' )){
                            $blocco = $xml->name;
                            $type = $xml->getAttribute('type');
                            $n = $xml->getAttribute('n');
                            $unit =  isset($xml->getAttribute('unit')) ? $xml->getAttribute('unit') : '';
    
    //here I get the child node
    $node = new SimpleXMLElement($xml->readOuterXML());
                            $head = $node->head ? (string)$node->head : '';
    
                            $indici_head[] = $head;
                            if($last_blocco != 'milestone') $level++;
                            if($blocco == 'div') $bc[$level] = $n; else $bc[($level+1)] = $n;
                            $bc_str = '';
                            for($j=1;$j<$level;$j++){
                                if( $bc_str != '' ) $bc_str.='.';
                                $bc_str.=$bc[$j];
                            }
                            if( $bc_str != '' ) $bc_str.='.';
                            $bc_str.=$n;
    
                            $last_blocco = $xml->name;
                            if( $blocco == 'milestone' ) $passed_milestone = true;
    
                            $indici_bc[]=$bc_str;
                        }
                    }
                    $xml->close();