代码之家 › 专栏 › 技术社区 › Phil

Apache POI XWPF-检查运行是否包含图片

xwpf xmlbeans apache-poi xpath java

0

Phil · 技术社区 · 7 年前

我的目标是处理 .docx公司 使用Apache POI以Java编写文档。我想从文档中提取所有内容以创建一个新的文档,但只能包含特定的内容,我可以从处理过的文档中选择这些内容。到目前为止,这对表格和文本有效,但我对图片有一个问题。通常我会这样提取它们:

List<XWPFPictureData> images = r.getEmbeddedPictures();

哪里 r 从段落中提取,类型为 XWPFRun . 这里最大的问题是,这种解决方案只适用于某些图像,这取决于图像如何插入word文档。

我可以访问运行的xml代码,并尝试查找这样的图像,这在python中运行良好,您可以在其中声明xpath查询。我在Java中尝试了同样的方法,但收到了一条错误消息。

以下是我的代码,用于检查运行是否包含图像:

r.getCTR().selectPath(".//w:drawing/wp:inline/a:graphic/a:graphicData/pic:pic/pic:blipFill/a:blip/@r:embed"))

它返回以下异常:

1 回复 | 直到 6 年前

1

2

Axel Richter 7 年前

所有可用的引擎都是命名空间感知引擎。因此,必须声明名称空间。

import java.io.FileInputStream;

import org.apache.poi.xwpf.usermodel.*;

import org.apache.xmlbeans.XmlObject;

public class WordRunSelectPath {

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("WordInsertPictures.docx"));
  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    String declareNameSpaces =   "declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main'; " 
                       + "declare namespace wp='http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing'; "
                       + "declare namespace a='http://schemas.openxmlformats.org/drawingml/2006/main'; "
                       + "declare namespace pic='http://schemas.openxmlformats.org/drawingml/2006/picture'; "
                       + "declare namespace r='http://schemas.openxmlformats.org/officeDocument/2006/relationships' ";

    XmlObject[] selectedObjects = run.getCTR().selectPath(
                         declareNameSpaces 
                       + ".//w:drawing/wp:inline/a:graphic/a:graphicData/pic:pic/pic:blipFill/a:blip/@r:embed");
    if (selectedObjects.length > 0) {
     String rID = selectedObjects[0].newCursor().getTextValue();
     System.out.println(rID);
    }
   }
  }

  document.close();
 }
}