Introduction To PDF Programming: Leonard Rosenthol Lazerware
Introduction To PDF Programming: Leonard Rosenthol Lazerware
Programming
Leonard Rosenthol
Lazerware
Overview
What might you want to do with PDF?
Review of available libraries
Review of the PDF file format
Developing with the Acrobat API
Developing with PDFlib
You are here because…
You’re a programmer looking to expand in
doing stuff with PDF.
You’re already programming PDF using some
library and wanted to hear about other
libraries.
There wasn’t anything else interesting to do.
You’re a friend of mine and wanted to heckle
How I do things
You should all have copies of the presentation
that you received when you walked in.
There is also an electronic copy of this
presentation (PDF format, of course!) on my
website at http://www.lazerware.com/
I’ve left time at the end for Q&A, but please
feel free to ask questions at any time!
What to do with PDF?
Creation
Report generation
Content repurposing
Document Conversion
Manipulation
Adding text or images
Form filling
Append or removing pages
Imposition
Adding structural elements
• Bookmarks, hyperlinks, etc.
Securing and signing
What else can you do?
Imaging
Printing
Rasterization (conversion to bitmap)
Content extraction/conversion
Text, HTML, XML
Postscript
Review of Libraries
Creation Only
PDFlib
ClibPDF (FastIO)
Panda (StillHQ)
PDF File Creator (FyTek)
PDF in a Box (Synactis)
PDFever (Perl Script Studio)
SanFace PDFLibrary (SanFace)
ReportLab
Libraries (cont)
Creation Only
retepPDF (Peter Mount)
Root River Delta (Root River Systems)
The Big Faceless PDF Library (Big Faceless)
iText (Lowagie)
Creation & Manipulation
PDFLibrary (Glance)
Life*JOVE (Corena)
PJ (Etymon)
activePDF Toolkit (ActivePDF)
Libraries (cont)
Imaging
5D PDFLibrary (Global Graphics)
Ghostscript (Artifex)
Everything
Acrobat SDK
Adobe PDFLibrary
DocuCom PDF Core Library (Zeon)
SPDF (Appligent)
What’s in a PDF?
Peeling the layers of PDF
PDF file
physical container in a file system containing
the PDF document and other data
PDF document (aka page description)
Contains one or more pages, where each page
consists of text, graphics and/or images as
well as hyperlinks, sounds, etc.
“other data”
PDF version, object catalog, etc.
PDF Document Layout
Header
Specifies PDF version
Body
Sequence of objects
XREF
Where to find each object
Trailer
Tells where to find XREF
Structure of a PDF
document
Imagable
Page 1 Thumbnail Annotations
Content
Page n
Article
...
threads
Thread n
Named
destinations
AcroForm
Smallest PDF
%PDF-1.1 xref
1 0 obj 0 5
<< 0000000000 65535 f
/Pages 3 0 R 0000000015 00000 n
/Type /Catalog 0000000085 00000 n
>> 0000000136 00000 n
endobj 0000000227 00000 n
2 0 obj trailer
<< <<
/Type /Page /Size 5
/Parent 3 0 R /Root 1 0 R
>> /ID[<5181383ede94727bcb32ac27ded71c68><5181383ede94727bcb32ac27ded71c68>]
endobj
3 0 obj >>
<< startxref
/Kids [ 2 0 R ] 277
/Count 1 %%EOF
/Type /Pages
/MediaBox [ 0 0 612 792 ]
>>
endobj
A look at the SDK
Where to find the “SDK”?
Acrobat Plugins
Mac OS & Windows
Adobe PDFLibrary
Mac OS, Windows, Linux x86, Solaris
SPDF (Appligent)
Mac OS, Windows, Linux (x86 & PPC), Solaris,
AIX, HP/UX, Digital Unix, IBM System 390
DocuCom PDF Core (Zeon)??
Windows
What’s in there?
Not every implementation of the “SDK”
has 100% of the same features (even
between Acrobat and PDFLibrary).