代码之家 › 专栏 › 技术社区 › tomaytotomato

使用SAX和W3C Java库的XML/HTML构建器类无法处理HTML输入

saxparser xml html java

tomaytotomato · 技术社区 · 10 年前

全部的

我正在开发一个JavaWeb服务,它从外部数据库获取存档数据。

要求是除了将结果作为XML消息返回外,客户端还可以请求将结果显示在HTML页面上。

该图描述了高级设计:

enter image description here

尽管我在实现过程中意识到XML和HTML并不完全相同,例如:

HTML解析可以允许特定元素不被关闭。

这导致我当前的类实现抛出这些错误(由HTML输入引起)

错误消息

[03/Mar/2014:09:12:05] warning (19052):     CORE3283: stderr: [Fatal Error] web.html:1:3: The markup in the document preceding the root element must be well-formed.
[03/Mar/2014:09:12:05] warning (19052):     CORE3283: stderr: org.xml.sax.SAXParseException: The markup in the document preceding the root element must be well-formed.

我的代码

    import org.w3c.dom.*;
import javax.xml.parsers.*;
import org.w3c.dom.Element;
import org.xml.sax.SAXException;


public class OutputBuilder
{
    private DocumentBuilderFactory docBF;
    private DocumentBuilder docBuilder;
    private Document doc;
    private static float UUID;
    private String docType;

    public OutputBuilder(String template, String output) throws ParserConfigurationException, SAXException, IOException
    {
        docBF = DocumentBuilderFactory.newInstance();
        docBuilder = docBF.newDocumentBuilder();

        //set the base document to the specified template file
        doc = docBuilder.parse(new File(template));
        //
        docType = output;
    }

    /*
     * Build the final document by adding values passed in from query results
     */
    public void fillTemplate(ResultSet qR) throws SQLException
    {
        if(docType.equals("html"))
        {
            //find the designated point of data insertion to the html document
            Element appendPoint = doc.getElementById("archive_table");

            //get meta data column names for table header row
            ResultSetMetaData rsmd = qR.getMetaData();

            //generate this first row which is the header
            Element headerRow = doc.createElement("tr");

            //create a column in the table header for each column in the query results
            for (int i = 0; i < rsmd.getColumnCount(); i++)
            {
                Element tableH = doc.createElement("th");
                tableH.setNodeValue(rsmd.getColumnName(i));
                headerRow.appendChild(tableH);              
            }

            //append header row to table
            appendPoint.appendChild(headerRow);

            //fill table body rows with query results

            while(qR.next())
            {
                //create a table row for each row in query results
                Element bodyRow = doc.createElement("tr"); 


                //fill that row with all column values in query results
                for(int i = 0; i < rsmd.getColumnCount(); i++)
                {
                    Element tableB = doc.createElement("td");
                    tableB.setNodeValue(qR.getString(i));
                    bodyRow.appendChild(tableB);
                }
                //add each constructed row to the table
                appendPoint.appendChild(bodyRow);
            }

        }
        else
        {
            //do XML construction
        }

    }

}

为了让我的类同时处理XML和HTML构造,我必须做哪些特定的库或新的逻辑?

欢迎任何其他建议!

为一个好的问题样本投赞成票

1 回复 | 直到 10 年前

Antoniossss 10 年前

我建议您开始为此类操作使用专用的html解析器。我个人使用 大杀器 您也可以使用它构建自己的HTML结构。