欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

使用docx4j工具进行DOCX文件操作:替换内容、转换为PDF和HTML等

最编程 2024-08-08 08:00:27
...

主要是想要用此功插件操作docx,主要的操作就是操作段落等信息,另外,也想实现替换docx的内容,实现根据模板动态生成内容的效果,也想用此插件实现docx转换pdf。

word的格式其实可以用xml来表现,docx4j也应该是基于xml来操作docx文档的。xml就比较好理解了。我们都是通过doc树的形式操作docx,只不过对于docx4j来说根节点是一个package,我们可以从根节点获取所有的内容,也可以指定元素的类型从document中查找元素集合,用下标访问指定位置的元素。

docx4j官网下载的包本身缺slf4j的支持包,而且转换pdf的时候fop-2.3的包与docx4j的包冲突,在文章最后会将最终整理过的docx4j及其相关依赖包附上下载链接。

1.docx的下载

到官网下载即可,下载的zip包里面有jar包,也有examples,下面的例子就是出自官网的examples。但是官网下载的lib里面日志记录缺失log4j的包和slf4j-log4j包。

官网下载地址:https://www.docx4java.org/downloads.html

2.简单的使用

0.   docx4j.properties  可以指定docx的一些全局属性,包括文字方向,纸张大小等。下面是官网给出的一个配置

# Page size: use a value from org.docx4j.model.structure.PageSizePaper enum
# eg A4, LETTER
docx4j.PageSize=LETTER
# Page size: use a value from org.docx4j.model.structure.MarginsWellKnown enum
docx4j.PageMargins=NORMAL
docx4j.PageOrientationLandscape=false
# Page size: use a value from org.pptx4j.model.SlideSizesWellKnown enum
# eg A4, LETTER
pptx4j.PageSize=LETTER
pptx4j.PageOrientationLandscape=false
# These will be injected into docProps/app.xml
# if App.Write=true
docx4j.App.write=true
docx4j.Application=docx4j
docx4j.AppVersion=2.7
# of the form XX.YYYY where X and Y represent numerical values
# These will be injected into docProps/core.xml
docx4j.dc.write=true
docx4j.dc.creator.value=docx4j
docx4j.dc.lastModifiedBy.value=docx4j
#
#docx4j.McPreprocessor=true
# If you haven't configured log4j yourself
# docx4j will autoconfigure it. Set this to true to disable that
docx4j.Log4j.Configurator.disabled=false

 

1.创建一个新的docx文档

    /**
     * 创建一个简单的docx
     */
    private static void createDocx() {
        // Create the package
        WordprocessingMLPackage wordMLPackage;
        try {
            wordMLPackage = WordprocessingMLPackage.createPackage();
            // 另存为新的文件
            wordMLPackage.save(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
        } catch (InvalidFormatException e) {
            log.error("createDocx error:InvalidFormatException", e);
        } catch (Docx4JException e) {
            log.error("createDocx error: Docx4JException", e);
        }
    }

 

  调用WordprocessingMLPackage.createPackage(); 创建一个包,并且调用其save(file)就是生成一个新的文件 

补充:还有另一种常用的保存方法是:

Docx4J.save(wordMLPackage, new File("C:/Users/liqiang/Desktop/docx4j/helloworld_2.docx"));

 

2.向文件中增加段落

    /**
     * 增加一个段落,增加完成记得保存,否则不生效
     */
    public static void addParagraph() {
        WordprocessingMLPackage wordprocessingMLPackage;
        try {
            wordprocessingMLPackage = WordprocessingMLPackage
                    .load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
            wordprocessingMLPackage.getMainDocumentPart().addParagraphOfText("Hello Word!");
            wordprocessingMLPackage.getMainDocumentPart().addStyledParagraphOfText("Title", "Hello Word!");
            wordprocessingMLPackage.getMainDocumentPart().addStyledParagraphOfText("Subtitle", " a subtitle!");
            wordprocessingMLPackage.save(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
        } catch (Docx4JException e) {
            log.error("addParagraph to docx error: Docx4JException", e);
        }
    }

调用WordprocessingMLPackage.load(file)   加载一个已经存在的docx,最后记得调用其save方法进行保存,否则修改不生效。

 

最后文件内容:

 

3.第二种采用工厂类增加段落的方法(工厂类的使用,工厂类也是一种通用的方法)

    /**
     * 增加一个段落,增加完成记得保存,否则不生效
     */
    public static void addParagraph2(String simpleText) {

        try {
            WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage
                    .load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
            org.docx4j.wml.ObjectFactory factory = Context.getWmlObjectFactory();
            org.docx4j.wml.P para = factory.createP();
            if (simpleText != null) {
                org.docx4j.wml.Text t = factory.createText();
                t.setValue(simpleText);
                org.docx4j.wml.R run = factory.createR();
                run.getContent().add(t);
                para.getContent().add(run);
            }
            wordprocessingMLPackage.getMainDocumentPart().getContent().add(para);
            wordprocessingMLPackage.save(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
        } catch (Exception e) {
            log.error("addParagraph to docx error: Docx4JException", e);
        }
    }

 先创建一个工厂,(需要导入的包是org.docx4j.wml,导错的的话下面全错)。

R是一个运行块,负责便于将多个属性相同的Object对象统一操作,通过其内部的content成员变量可以添加内容,RPr是运行块的属性(属于类R的一个成员变量),可以对R对象进行操作。R通过被作为其他对象的content内容。所以通过R在A元素中加一个B元素的操作的一般步骤是:(1)创建R;(2)将内容元素B加到R中;(3)将R增加到A元素中;(4)将A元素加到mainDocumentPart内容中。

 

补充:工厂类的一些通用方法:

 

4.读取文件的内容

    private static void readParagraph() {
        try {
            WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage
                    .load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));

            String contentType = wordprocessingMLPackage.getContentType();
            log.info("contentType -> {}", contentType);

            MainDocumentPart mainDocumentPart = wordprocessingMLPackage.getMainDocumentPart();
            List<Object> content = mainDocumentPart.getContent();
            for (Object ob : content) {
                log.info("ob -> {}", ob);
            }
        } catch (Docx4JException e) {
            log.error("createDocx error: Docx4JException", e);
        }
    }

结果:

2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] contentType -> application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob ->  a subtitle!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob ->  a subtitle!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob -> Hello Word!
2018-10-28 13:13:16 [cn.qlq.docx4j.Docx4jTest]-[INFO] ob ->  a subtitle!

 

 5.创建表格

(1)创建一个普通的表格

    public static void addTable() {
        try {
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
            ObjectFactory factory = Context.getWmlObjectFactory();
            MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();

            // 创建表格元素
            Tbl table = factory.createTbl();
            addBorders(table);

            for (int i = 0; i < 3; i++) {
                Tr tr = factory.createTr();
                for (int j = 0; j < 3; j++) {
                    Tc tc = factory.createTc();
                    P p = mainDocumentPart.createParagraphOfText("---row" + i + "---column" + j + "---");
                    tc.getContent().add(p);
                    tr.getContent().add(tc);

                }
                table.getContent().add(tr);
            }

            mainDocumentPart.addObject(table);
            wordMLPackage.save(new java.io.File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
        } catch (Docx4JException e) {
            log.error("createDocx error: Docx4JException", e);
        }

    }

 

查看createParagraphOfText(str)的源码:(1.创建一个text,并设置其值,2.创建一个R并将text增加到R中,3.创建一个P将R加到P中)

    public org.docx4j.wml.P createParagraphOfText(String simpleText) {
        
        org.docx4j.wml.ObjectFactory factory = Context.getWmlObjectFactory();
        org.docx4j.wml.P  para = factory.createP();

        if (simpleText!=null) {
            org.docx4j.wml.Text  t = factory.createText();
            t.setValue(simpleText);
    
            org.docx4j.wml.R  run = factory.createR();
            run.getContent().add(t); // ContentAccessor        
            
            para.getContent().add(run); // ContentAccessor
        }
        
        return para;
    }

结果:

 

上面的表格创建出来了,但是表格的边框也没有,接下来研究更复杂的操作,包括显示边框,合并单元格,设置单元格样式。 

 

(2)显示表格的边框

public static void addTable() {
        try {
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
            ObjectFactory factory = Context.getWmlObjectFactory();
            MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();

            // 0. 创建表格元素
            Tbl table = factory.createTbl();

            // 1.显示表格的边框
            addBorders(table);

            // 2.添加表格内容(创建行和列)
            for (int i = 0; i < 3; i++) {
                Tr tr = factory.createTr();
                for (int j = 0; j < 3; j++) {
                    Tc tc = factory.createTc();
                    P p = mainDocumentPart.createParagraphOfText("---row" + i + "---column" + j + "---");//

                    tc.getContent().add(p);
                    tr.getContent().add(tc);

                }
                table.getContent().add(tr);
            }

            // 3.加表格加到主要内容中
            mainDocumentPart.addObject(table);
            wordMLPackage.save(new java.io.File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
        } catch (Docx4JException e) {
            log.error("createDocx error: Docx4JException", e);
        }
    }

    /**
     * 设置边框样式
     * 
     * @param table
     *            需要设置表格边框的单元格
     */
    private static void addBorders(Tbl table) {
        table.setTblPr(new TblPr());// 必须设置一个TblPr,否则最后会报空指针异常

        CTBorder border = new CTBorder();
        border.setColor("auto");
        border.setSz(new BigInteger("4"));
        border.setSpace(new BigInteger("0"));
        border.setVal(STBorder.SINGLE);

        TblBorders borders = new TblBorders();
        borders.setBottom(border);
        borders.setLeft(border);
        borders.setRight(border);
        borders.setTop(border);
        borders.setInsideH(border);
        borders.setInsideV(border);

        // 获取其内部的TblPr属性设置属性
        table.getTblPr().setTblBorders(borders);
    } 

结果:

 

(3)设置表格居中显示,而且内容部分字体加粗,设置列宽等操作

    public static void addTable() {
        try {
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
            ObjectFactory factory = Context.getWmlObjectFactory();
            MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();

            // 0. 创建表格元素
            Tbl table = factory.createTbl();

            // 1.显示表格的边框
            addBorders(table);

            // 2.添加表格内容(创建行和列)
            for (int i = 0; i < 3; i++) {
                Tr tr = factory.createTr();
                for (int j = 0; j < 3; j++) {
                    Tc tc = factory.createTc();

                    // P p = mainDocumentPart.createParagraphOfText("---row" + i
                    // + "---column" + j + "---");
                    // 第二种创建P并设置样式的方法
                    P p1 = factory.createP();
                    R r = factory.createR();
                    Text text = factory.createText();
                    text.setValue("---row" + i + "---column" + j + "---");

                    r.getContent().add(text);
                    p1.getContent().add(r);

                    // 2.1通过R设置字体加粗等属性
                    setRStyle(r);
                    // 2.2设置列宽
                    if (j == 1) {
                        setCellWidth(tc, 1250);
                    } else {
                        setCellWidth(tc, 2500);
                    }

                    tc.getContent().add(p1);
                    tr.getContent().add(tc);

                }
                table.getContent().add(tr);
            }

            // 3.合并单元格

            // 3.加表格加到主要内容中
            mainDocumentPart.addObject(table);
            wordMLPackage.save(new java.io.File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
        } catch (Docx4JException e) {
            log.error("createDocx error: Docx4JException", e);
        }
    }

    /**
     * 设置列宽
     * 
     * @param tc
     * @param width
     */
    private static void setCellWidth(Tc tc, int width) {
        TcPr tableCellProperties = new TcPr();
        TblWidth tableWidth = new TblWidth();
        tableWidth.setW(BigInteger.valueOf(width));
        tableCellProperties.setTcW(tableWidth);

        tc.setTcPr(tableCellProperties);
    }

    /**
     * 通过设置R设置表格中属性字体加粗,大小为25
     * 
     * @param
     */
    private static void setRStyle(R r) {
        // 1.创建一个RPr
        RPr rpr = new RPr();

        // 2.设置RPr
        // 2.1设置字体大小
        HpsMeasure size = new HpsMeasure();
        size.setVal(new BigInteger("25"));
        rpr.setSz(size);
        // 2.2设置加粗
        BooleanDefaultTrue bold = new BooleanDefaultTrue();
        bold.setVal(true);
        rpr.setB(bold);

        // 3.将RPr设置为R的属性
        r.setRPr(rpr);
    }

    /**
     * 设置边框样式
     * 
     * @param table
     *            需要设置表格边框的单元格
     */
    private static void addBorders(Tbl table) {
        table.setTblPr(new TblPr());// 必须设置一个TblPr,否则最后会报空指针异常

        CTBorder border = new CTBorder();
        border.setColor("auto");
        border.setSz(new BigInteger("4"));
        border.setSpace(new BigInteger("0"));
        border.setVal(STBorder.SINGLE);

        TblBorders borders = new TblBorders();
        borders.setBottom(border);
        borders.setLeft(border);
        borders.setRight(border);
        borders.setTop(border);
        borders.setInsideH(border);
        borders.setInsideV(border);

        // 获取其内部的TblPr属性设置属性
        table.getTblPr().setTblBorders(borders);
    }

结果:

关于表格合并或者更加复杂的操作参考:https://www.cnblogs.com/cxxjohnson/p/7886275.html

 

6.读取表格内容:(解析docx4j的树结构---获取指定类型的元素)

表格内容:

 

代码:(有时候我们调用getContent()获取的元素类型是Tr之类的直接元素,可以强转;有时候不可以直接强转,其类型是JAXBElement,需要进行提取---getAllElementFromObject方法)

package cn.qlq.docx4j;

import java.util.ArrayList;
import java.util.List;

import javax.xml.bind.JAXBElement;
import javax.xml.bind.JAXBException;

import org.docx4j.TraversalUtil;
import org.docx4j.finders.ClassFinder;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.ContentAccessor;
import org.docx4j.wml.Tbl;
import org.docx4j.wml.Tc;
import org.docx4j.wml.Tr;

/**
 * 循环替换表格内容
 * 
 * @author QiaoLiQiang
 * @time 2018年10月28日下午8:51:41
 */
public class ReplaceTable {

    public static void main(String[] args) throws JAXBException {
        String template = "C:/Users/liqiang/Desktop/docx4j/helloworld_1.docx";
        WordprocessingMLPackage wordMLPackage;
        try {
            wordMLPackage = WordprocessingMLPackage.load(new java.io.File(template));
            MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

            // 1. ClassFinder 构造类型查询器获取指定元素
            ClassFinder find = new ClassFinder(Tbl.class);
            new TraversalUtil(documentPart.getContent(), find);

            Tbl table = (Tbl) find.results.get(0);// 获取到第一个表格元素
            List<Object> trs = table.getContent();
            System.out.println(trs);
            System.out.println("=====================");

            for (Object obj : trs) {
                Tr tr = (Tr) obj;// 获取到tr
                List<Object> content = tr.getContent();
                System.out.println(content);
                List<Object> objList = getAllElementFromObject(tr, Tc.class);// 获取所有的Tc元素
                for (Object obj1 : objList) {
                    Tc tc = (Tc) obj1;
                    System.out.println(tc.getContent());
                }
                System.out.println("===============");
            }
        } catch (Docx4JException e) {
            e.printStackTrace();
        }
    }

    private static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
        List<Object> result = new ArrayList<Object>();
        if (obj instanceof JAXBElement)
            obj = ((JAXBElement<?>) obj).getValue();
        if (obj.getClass().equals(toSearch))
            result.add(obj);
        else if (obj instanceof ContentAccessor) {
            List<?> children = ((ContentAccessor) obj).getContent();
            for (Object child : children) {
                result.addAll(getAllElementFromObject(child, toSearch));
            }
        }
        return result;
    }
}

 结果:

[org.docx4j.wml.Tr@234f18c8, org.docx4j.wml.Tr@1de40494, org.docx4j.wml.Tr@64e89fe0, org.docx4j.wml.Tr@64585ee1, org.docx4j.wml.Tr@65bd393e, org.docx4j.wml.Tr@69f949a0]
=====================
[javax.xml.bind.JAXBElement@6d50ddba, javax.xml.bind.JAXBElement@580d1667, javax.xml.bind.JAXBElement@4339f15a]
[姓名]
[性别]
[年龄]
===============
[javax.xml.bind.JAXBElement@11146e31, javax.xml.bind.JAXBElement@544e5bb9, javax.xml.bind.JAXBElement@6467f9ec]
[name0]
[sex0]
[age0]
===============
[javax.xml.bind.JAXBElement@66492873, javax.xml.bind.JAXBElement@4cfeca7b, javax.xml.bind.JAXBElement@6b9f78ba]
[name1]
[sex1]
[age1]
===============
[javax.xml.bind.JAXBElement@32af3289, javax.xml.bind.JAXBElement@c1eda5e, javax.xml.bind.JAXBElement@3d925789]
[name2]
[sex2]
[age2]
===============
[javax.xml.bind.JAXBElement@52b102f3, javax.xml.bind.JAXBElement@6338c9ee, javax.xml.bind.JAXBElement@25515b26]
[name3]
[sex3]
[age3]
===============
[javax.xml.bind.JAXBElement@372eee, javax.xml.bind.JAXBElement@26ea0b5e, javax.xml.bind.JAXBElement@4f905c47]
[name4]
[sex4]
[age4]
===============

 

7.格式化样式的操作:

 有时候我们需要格式化一些样式,每个元素内部都有一个XXXpr属性用于操作样式,Pr表示Properties,如下:

 

3.docx4j高级用法

1.docx转换为html

 参考github官网:https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ConvertOutHtml.java

package cn.qlq.docx4j;

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.convert.out.HTMLSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.samples.AbstractSample;

public class Docx2Html extends AbstractSample {

    static {

        inputfilepath = "C:/Users/liqiang/Desktop/docx4j/helloworld.docx";
        save = true;
        nestLists = true;
    }

    static boolean save;
    static boolean nestLists;

    public static void main(String[] args) throws Exception {
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                .load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));

        HTMLSettings htmlSettings = Docx4J.createHTMLSettings();

        htmlSettings.setImageDirPath(inputfilepath + "_files");
        htmlSettings.setImageTargetUri(inputfilepath.substring(inputfilepath.lastIndexOf("/") + 1) + "_files");
        htmlSettings.setWmlPackage(wordMLPackage);

        String userCSS = null;
        if (nestLists) {
            userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img,  table, caption, tbody, tfoot, thead, tr, th, td "
                    + "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";
        } else {
            userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img,  ol, ul, li, table, caption, tbody, tfoot, thead, tr, th, td "
                    + "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";

        }
        htmlSettings.setUserCSS(userCSS);

        OutputStream os;
        if (save) {
            os = new FileOutputStream(inputfilepath + ".html");
        } else {
            os = new ByteArrayOutputStream();
        }

        Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);

        Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

        if (save) {
            System.out.println("Saved: " + inputfilepath + ".html ");
        } else {
            System.out.println(((ByteArrayOutputStream) os).toString());
        }

        if (wordMLPackage.getMainDocumentPart().getFontTablePart() != null) {
            wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
        }
        htmlSettings = null;
        wordMLPackage = null;
    }
}

 

 封装为一个更简单的工具类的代码如下:(userCSS是生成的html的样式,可以手动设置,使用此参数可以灵活的设置边距字体等信息)

package cn.qlq.docx4j;

import java.io.File;
import java.io.FileOutputStream;

import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.convert.out.HTMLSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.samples.AbstractSample;

public class Docx2Html extends AbstractSample {

    public static void main(String[] args) throws Exception {
        String inputfilepath = "C:/Users/liqiang/Desktop/docx4j/helloworld.docx";
        boolean nestLists = true;

        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                .load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));

        HTMLSettings htmlSettings = Docx4J.createHTMLSettings();

        htmlSettings.setImageDirPath(inputfilepath + "_files");
        htmlSettings.setImageTargetUri(inputfilepath.substring(inputfilepath.lastIndexOf("/") + 1) + "_files");
        htmlSettings.setWmlPackage(wordMLPackage);

        String userCSS = null;
        if (nestLists) {
            userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img,  table, caption, tbody, tfoot, thead, tr, th, td "
                    + "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";
        } else {
            userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img,  ol, ul, li, table, caption, tbody, tfoot, thead, tr, th, td "
                    + "{ margin: 0; padding: 0; border: 0;}" + "body {line-height: 1;} ";

        }
        htmlSettings.setUserCSS(userCSS);

        Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);

        Docx4J.toHTML(htmlSettings, new FileOutputStream(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.html")),
                Docx4J.FLAG_EXPORT_PREFER_XSL);

        if (wordMLPackage.getMainDocumentPart().getFontTablePart() != null) {
            wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
        }
        htmlSettings = null;
        wordMLPackage = null;
    }
}

 

 2.docx转换为pdf

代码简单,但是依赖的包比较多,依赖了batik解析SVG的项目包,也依赖fop包,而且docx4j-community-6.0.1.zip里面自带的optional\export-fo下面的fop-2.3.jar与docx冲突,所以需要fop-2.1版本才可以转换。所以需要删掉自带的2.3版本,自行下载2.1版本。

    public static void main(String[] args) throws Exception {
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                .load(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));

        Docx4J.toPDF(wordMLPackage, new FileOutputStream(new File("C:/Users/liqiang/Desktop/docx4j/helloworld.pdf")));

    }

 

3.docx中写入图片 

package cn.qlq.docx4j;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;

import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.BinaryPartAbstractImage;
import org.docx4j.wml.Drawing;
import org.docx4j.wml.ObjectFactory;
import org.docx4j.wml.P;
import org.docx4j.wml.R;

public class ImageHandle {
    /**
     * 像往常一样, 我们创建了一个包(package)来容纳文档. 然后我们创建了一个指向将要添加到文档的图片的文件对象.为了能够对图片做一些操作,
     * 我们将它转换 为字节数组. 最后我们将图片添加到包中并保存这个包(package).
     */
    public static void main(String[] args) throws Exception {
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
        File file = new File("C:/Users/liqiang/Desktop/docx4j/3.jpg");
        byte[] bytes = convertImageToByteArray(file);
        addImageToPackage(wordMLPackage, bytes);
        wordMLPackage.save(new java.io.File("C:/Users/liqiang/Desktop/docx4j/helloworld.docx"));
    }

    /**
     * Docx4j拥有一个由字节数组创建图片部件的工具方法, 随后将其添加到给定的包中. 为了能将图片添加 到一个段落中,
     * 我们需要将图片转换成内联对象. 这也有一个方法, 方法需要文件名提示, 替换文本, 两个id标识符和一个是嵌入还是链接到的指示作为参数.
     * 一个id用于文档中绘图对象不可见的属性, 另一个id用于图片本身不可见的绘制属性. 最后我们将内联 对象添加到段落中并将段落添加到包的主文档部件.
     *
     * @param wordMLPackage
     *            要添加图片的包
     * @param bytes
     *            图片对应的字节数组
     * @throws Exception
     *             不幸的createImageInline方法抛出一个异常(没有更多具体的异常类型)
     */
    private static void addImageToPackage(WordprocessingMLPackage wordMLPackage, byte[] bytes) throws Exception {
        BinaryPartAbstractImage imagePart = BinaryPartAbstractImage.createImagePart(wordMLPackage, bytes);

        int docPrId = 1;
        int cNvPrId = 2;
        org.docx4j.dml.wordprocessingDrawing.Inline inline = imagePart.createImageInline("Filename hint",
                "Alternative text", docPrId, cNvPrId, false);

        P paragraph = addInlineImageToParagraph(inline);

        wordMLPackage.getMainDocumentPart().addObject(paragraph);
    }

    /**
     * 创建一个对象工厂并用它创建一个段落和一个可运行块R. 然后将可运行块添加到段落中. 接下来创建一个图画并将其添加到可运行块R中. 最后我们将内联
     * 对象添加到图画中并返回段落对象.
     *
     * @param inline
     *            包含图片的内联对象.
     * @return 包含图片的段落
     */
    private static P addInlineImageToParagraph(org.docx4j.dml.wordprocessingDrawing.Inline inline) {
        // 添加内联对象到一个段落中
        ObjectFactory factory = new ObjectFactory();
        P paragraph = factory.createP();
        R run = factory.createR();
        paragraph.getContent().add(run);
        Drawing drawing = factory.createDrawing();
        run.getContent().add(drawing);
        drawing.getAnchorOrInline().add(inline);
        return paragraph;
    }

    /**
     * 将图片从文件对象转换成字节数组.
     * 
     * @param file
     *            将要转换的文件
     * @return 包含图片字节数据的字节数组
     * @throws FileNotFoundException
     * @throws IOException
     */
    private static byte[] convertImageToByteArray(File file) throws FileNotFoundException, IOException {
        InputStream is = new FileInputStream(file);
        long length = file.length();
        // 不能使用long类型创建数组, 需要用int类型.
        if (length > Integer.MAX_VALUE) {
            System.out.println("File too large!!");
        }
        byte[] bytes = new byte[(int) length];
        int offset = 0;
        int numRead = 0;
        while (offset < bytes.length && (numRead = is.read(bytes, offset, bytes.length - offset)) >= 0) {
            offset += numRead;
        }
        // 确认所有的字节都没读取
        if (offset < bytes.length) {
            System.out.println("Could not completely read file " + file.getName());
        }
        is.close();
        return bytes;
    }
}

 

 4.按指定变量替换docx中的内容  ${var}替换

注意:模板的${var}在书写的时候必须从左向右书写(不能直接{},然后在中间写括号

推荐阅读