这实际上是一件相当困难的事情,因为任意的HTML网页有时是畸形的(主要浏览器是相当宽容的)。你可能想看看
swing html parser
我从未尝试过,但看起来这可能是最好的选择。您还可以尝试这样做,并处理可能出现的任何解析异常(尽管我只在XML中尝试过这样做):
import java.io.File;
import org.w3c.dom.Document;
import org.w3c.dom.*;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
...
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (InputStreamYouBuiltEarlierFromAnHTTPRequest);
}
catch (ParserConfigurationException e)
{
...
}
catch (SAXException e)
{
...
}
catch (IOException e)
{
...
}
...