代码之家  ›  专栏  ›  技术社区  ›  theomega

Java、XML、XSLT:防止DTD验证

  •  5
  • theomega  · 技术社区  · 15 年前

    我使用Java(6)XML API对Web上的HTML文档应用XSLT转换。此文档格式正确,因此包含有效的DTD规范( <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> ) 现在出现了一个问题:支持转换,XSLT处理器尝试下载DTD,W3服务器通过HTTP 503错误(由于 Bandwith Limitation W3)。

    如何防止XSLT处理器下载DTD?我不需要验证输入文档。

    来源是:

    import javax.xml.transform.Source;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.stream.StreamResult;
    import javax.xml.transform.stream.StreamSource;
    

    ——

       String xslt = "<?xml version=\"1.0\"?>"+
       "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"+
       "    <xsl:output method=\"text\" />"+          
       "    <xsl:template match=\"//html/body//div[@id='bodyContent']/p[1]\"> "+
       "        <xsl:value-of select=\".\" />"+
       "     </xsl:template>"+
       "     <xsl:template match=\"text()\" />"+
       "</xsl:stylesheet>";
    
       try {
       Source xmlSource = new StreamSource("http://de.wikipedia.org/wiki/Right_Livelihood_Award");
       Source xsltSource = new StreamSource(new StringReader(xslt));
       TransformerFactory ft = TransformerFactory.newInstance();
    
       Transformer trans = ft.newTransformer(xsltSource);
    
       trans.transform(xmlSource, new StreamResult(System.out));
       }
       catch (Exception e) {
         e.printStackTrace();
       }
    

    我在这里阅读了下面的问题,但是它们都使用另一个XML API:

    谢谢!

    5 回复  |  直到 11 年前
        1
  •  5
  •   Riduidel    11 年前

    我最近在使用JAXB解组XML时遇到了这个问题。答案是从xmlreader和inputsource创建一个saxsource,然后将其传递给jaxb unmarshaller的unmarshal()方法。为了避免加载外部DTD,我在XmlReader上设置了一个自定义EntityResolver。

    SAXParserFactory spf = SAXParserFactory.newInstance();
    SAXParser sp = spf.newSAXParser();
    XMLReader xmlr = sp.getXMLReader();
    xmlr.setEntityResolver(new EntityResolver() {
        public InputSource resolveEntity(String pid, String sid) throws SAXException {
            if (sid.equals("your remote dtd url here"))
                return new InputSource(new StringReader("actual contents of remote dtd"));
            throw new SAXException("unable to resolve remote entity, sid = " + sid);
        } } );
    SAXSource ss = new SAXSource(xmlr, myInputSource);
    

    如前所述,如果有人要求这个自定义实体解析器解析一个您希望它解析的实体以外的实体,那么它将抛出一个异常。如果只想继续加载远程实体,请删除“throws”行。

        2
  •  3
  •   Kai Woska    15 年前

    尝试在DocumentBuilderFactory中设置功能:

    URL url = new URL(urlString);
    InputStream is = url.openStream();
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    DocumentBuilder db;
    db = dbf.newDocumentBuilder();
    Document result = db.parse(is);
    

    现在,当调用document函数分析外部XHTML页面时,我在xslt(2)中遇到了同样的问题。

        3
  •  2
  •   Peter Borbas    11 年前

    之前的答案引导我找到了一个解决方案,但对我来说并不明显,所以这里有一个完整的答案:

    private void convert(InputStream xsltInputStream, InputStream srcInputStream, OutputStream destOutputStream) throws SAXException, ParserConfigurationException,
            TransformerFactoryConfigurationError, TransformerException, IOException {
        //create a parser with a fake entity resolver to disable DTD download and validation
        XMLReader xmlReader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
        xmlReader.setEntityResolver(new EntityResolver() {
            public InputSource resolveEntity(String pid, String sid) throws SAXException {
                return new InputSource(new ByteArrayInputStream(new byte[] {}));
            }
        });
        //create the transformer
        Source xsltSource = new StreamSource(xsltInputStream);
        Transformer transformer = TransformerFactory.newInstance().newTransformer(xsltSource);
        //create the source for the XML document which uses the reader with fake entity resolver
        Source xmlSource = new SAXSource(xmlReader, new InputSource(srcInputStream));
        transformer.transform(xmlSource, new StreamResult(destOutputStream));
    }
    
        4
  •  0
  •   user668834    13 年前

    如果你使用

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    

    您可以尝试使用fllowing代码禁用DTD验证:

     dbf.setValidating(false);
    
        5
  •  -1
  •   Chris    15 年前

    您需要使用javax.xml.parsers.documentbuilderFactory

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(false);
    DocumentBuilder builder = factory.newDocumentBuilder();
    InputSource src = new InputSource("http://de.wikipedia.org/wiki/Right_Livelihood_Award")
    Document xmlDocument = builder.parse(src.getByteStream());
    DOMSource source = new DOMSource(xmlDocument);
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer(xsltSource);
    transformer.transform(source, new StreamResult(System.out));