代码之家  ›  专栏  ›  技术社区  ›  Mat

如何匹配文本节点,然后使用XPath跟踪父节点

  •  14
  • Mat  · 技术社区  · 15 年前

    content 节点。

    <doc>
        <block>
            <title>Text 1</title>
            <content>Stuff I want</content>
        </block>
    
        <block>
            <title>Text 2</title>
            <content>Stuff I don't want</content>
        </block>
    </doc>
    

    >>> from lxml import etree
    >>>
    >>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff 
    I want</content></block><block><title>Text 2</title><content>Stuff I d
    on't want</content></block></doc>")
    >>>
    >>> # get all titles
    ... tree.xpath('//title/text()')
    ['Text 1', 'Text 2']
    >>>
    >>> # match 'Text 1'
    ... tree.xpath('//title/text()="Text 1"')
    True
    >>>
    >>> # Follow parent from selected nodes
    ... tree.xpath('//title/text()/../..//text()')
    ['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"]
    >>>
    >>> # Follow parent from selected node
    ... tree.xpath('//title/text()="Text 1"/../..//text()')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/
    lxml/lxml.etree.c:14542)
      File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca
    ll__ (src/lxml/lxml.etree.c:90093)
      File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl
    e_result (src/lxml/lxml.etree.c:89446)
      File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise
    _eval_error (src/lxml/lxml.etree.c:89281)
    lxml.etree.XPathEvalError: Invalid type
    

    这在XPath中可能吗?我需要用不同的方式表达我想做的事情吗?

    2 回复  |  直到 15 年前
        1
  •  23
  •   Johannes Weiss    15 年前

    你想要吗?

    //title[text()='Text 1']/../content/text()
    
        2
  •  16
  •   Mathias Müller    10 年前

    :

    string(/*/*/title[. = 'Text 1']/following-sibling::content)
    

    这至少代表了两个改进 与Johannes WeiŸŸ目前接受的解决方案相比:

    1. 非常昂贵的缩写“/” (通常导致扫描整个XML文档) 只要事先知道XML文档的结构,就应该如此。

    2. 不会返回到父级