PDF
- Portable Document Format 的简称,意为“便携式文档格式”
- Adobe公司方法
- PostScript,用以生成和输出图形,在任何打印机上都可以保证精准的的颜色和准确的打印效果
- 字型嵌入系统,可使字型随文件一起传输
- 结构化的存储系统,绑定元素和任何相关内容到单个文件,带有适当的数据压缩系统
处理和第三方包
- 常见功能处理
– 解析PDF
– 生成PDF(其他类型文件转化) - 第三方包
– Apache PDFBox
– iText(收费)
– XDocReport(将docx转化为PDF)
PDFBox
- 纯Java类库
- 主要功能:创建,提取文本,分隔/合并/删除…
- 主要类
– PDDocument pdf文档对象
– PDFTextStripper pdf文本对象
– PDFMergerUtility 合并工具
public static void main(String[] args){
File pdfFile = new File("simple.pdf");
PDDocument document = null;
try
{
document=PDDocument.load(pdfFile);
AccessPermission ap = document.getCurrentAccessPermission();
if (!ap.canExtractContent())
{
throw new IOException("你没有权限抽取文本");
}
int pages = document.getNumberOfPages();
PDFTextStripper stripper=new PDFTextStripper();
stripper.setSortByPosition(true);
stripper.setStartPage(1);
stripper.setEndPage(pages);
String content = stripper.getText(document);
System.out.println(content);
}
catch(Exception e)
{
System.out.println(e);
}
}
public static void createHelloPDF() {
PDDocument doc = null;
PDPage page = null;
try {
doc = new PDDocument();
page = new PDPage();
doc.addPage(page);
PDFont font = PDType1Font.HELVETICA_BOLD;
PDPageContentStream content = new PDPageContentStream(doc, page);
content.beginText();
content.setFont(font, 12);
content.moveTextPositionByAmount(100, 700);
content.showText("hello world");
content.endText();
content.close();
doc.save("test.pdf");
doc.close();
} catch (Exception e) {
System.out.println(e);
}
}
public static void merge() throws Exception
{
FileOutputStream fos = new FileOutputStream(new File("merge.pdf"));
ByteArrayOutputStream mergedPDFOutputStream = null;
File file1 = new File("sample1.pdf");
File file2 = new File("sample2.pdf");
List<InputStream> sources = new ArrayList<InputStream>();
try
{
sources.add(new FileInputStream(file1));
sources.add(new FileInputStream(file2));
mergedPDFOutputStream = new ByteArrayOutputStream();
PDFMergerUtility pdfMerger = new PDFMergerUtility();
pdfMerger.addSources(sources);
pdfMerger.setDestinationStream(mergedPDFOutputStream);
PDDocumentInformation pdfDocumentInfo = new PDDocumentInformation();
pdfMerger.setDestinationDocumentInformation(pdfDocumentInfo);
pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
fos.write(mergedPDFOutputStream.toByteArray());
fos.close();
}
catch (Exception e)
{
throw new IOException("PDF merge problem", e);
}
finally
{
for (InputStream source : sources)
{
IOUtils.closeQuietly(source);
}
IOUtils.closeQuietly(mergedPDFOutputStream);
IOUtils.closeQuietly(fos);
}
}
public static void main(String[] args) throws Exception {
File file = new File("merge.pdf");
PDDocument document = PDDocument.load(file);
int noOfPages = document.getNumberOfPages();
System.out.println("total pages: " + noOfPages);
document.removePage(1);
System.out.println("page removed");
document.save("merge2.pdf");
document.close();
}
XDocReport
- 将docx文档合并输出为其他数据格式(pdf/html…)
- pdfConverter
- 基于poi和iText完成
public class XDocReportTest {
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("template.docx"));
PdfOptions options = PdfOptions.create();
options.fontProvider(new IFontProvider() {
public Font getFont(String familyName, String encoding, float size, int style, Color color) {
try {
BaseFont bfChinese = BaseFont.createFont(
"C:\\Program Files (x86)\\Microsoft Office\\root\\VFS\\Fonts\\private\\STSONG.TTF",
BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font fontChinese = new Font(bfChinese, size, style, color);
if (familyName != null)
fontChinese.setFamily(familyName);
return fontChinese;
} catch (Throwable e) {
e.printStackTrace();
return ITextFontRegistry.getRegistry().getFont(familyName, encoding, size, style, color);
}
}
});
PdfConverter.getInstance().convert(doc, new FileOutputStream("template.pdf"), options);
}
}
总结
- pdf操作,基于Apache PDFBox库来对pdf操作,(解析,合并,删除页面)
- 产生pdf和修改pdf,建议先用docx文件来操作,然后转化为pdf