How PDF Works: Gary Staas
How PDF Works: Gary Staas
Gary Staas
ByteSizeBooks.com
www.pdfdream.com
1 of 35
PDF Dream
Outline
How PDF represents a document
File structure4 parts
Document structureelements that build
a document
Painting pagesportray any document
Dictionariesexible data structure used
throughout PDF
Acrobat Formsbuilt from dictionaries
and annotations
2 of 35
PDF Dream
PDF Specication
PDF 1.4
PDF Reference, third edition
Adobe Portable Document Format
Version 1.4
http://partners.adobe.com/asn/
developer/acrosdk/docs/lefmtspecs/
PDFReference.pdf
3 of 35
PDF Dream
Structure levels
File structure
Four parts, one contains most data
Document structure
How document elements represented
Pages, annotations, bookmarks
Cos objects
Building blocks of document elements
Same as PostScript objects
Objects may be used repeatedly
Dictionaryimportant object type
4 of 35
PDF Dream
File structure
Header-version #
Body-most data
Document elements
5 of 35
PDF Dream
Incremental update
Header
Body
Original le
Xref
Trailer
Body
Update section
(modied part)
Xref
Trailer
6 of 35
PDF Dream
Incremental update
One update section added for each save
File gets bigger after each save
Can revert to previous versions by
chopping off update sections
Digital signatures uses this feature
Save As... compacts le
Combines update sections and original
le into one Body, Xref, and Trailer
7 of 35
PDF Dream
8 of 35
PDF Dream
Document structure
Document comprised of various elements
Catalog/root
Pages
Annotations
Bookmarks
Each document element has its format
dened in PDF Reference
Example: Info dictionary
<<
/CreationDate (D:20010329220824Z)
/ModDate (D:20011006152637-07'00')
/Producer (Acrobat Distiller 5.0)
/Title (Acrobat SDK Release Notes)
/Creator (FrameMaker 5.5.6p145)
/Author (Adobe Developer Support)
>>
9 of 35
PDF Dream
Document structure
Trailer points to 2 things
Info dictionary
Catalog (root) contains document
objects
Page treeall doc pagestypically most of data
Viewer preferencesshow/hide toolbar...
Page labels
Form information
Bookmark (outline) tree
Everything else!
10 of 35
PDF Dream
Page tree
Allows random page access
Pages
Pages
Pages
Pages
Pages
Page
34
Page
35
Pages
Pages
Pages
Page
36
Pages
Page
37
11 of 35
PDF Dream
Page
Each page fully self contained for page
independence
Refer to everything needed for page in
one place
Know where to nd page data
Cos objectsbyte displacements in le from
cross reference table
12 of 35
PDF Dream
Page components
Contentspage descriptionvisible part
Resources
Info needed to render page, e.g., fonts
Thumbnail
Bitmap of page
Crop box
Annotations
Additional data on page
Standard-built-in annots
Custom annots, e.g., Acrobat Form
elds
13 of 35
PDF Dream
Cos objects
Document elements are built from Cos
objects
Types
Numberinteger and real
Stringtext
Arraylist
Dictionarykey-value pairs
Streamcontains data stream
14 of 35
PDF Dream
Cos objectsDictionary
Unordered set of key-value pairs
Database
Inherently extensible
Can represent many data structures
Many things in PDF le are dictionaries
Info dictionary
Pages
Annotations
Much of PDF Ref is dictionary denitions
Undened entries ignored by Acrobat
Can easily add custom data
15 of 35
PDF Dream
Cos objectsDictionary
Example: Text annotation
<<
/Type /Annot
/Subtype /Text
/Rect [266 116 430 204]
/Contents (Data of text annot)
>>
16 of 35
PDF Dream
Cos objectsStream
Two parts:
dictionary describing stream, e.g.,
length
data, which is usually compressed
Streams in PDF le
Page contents
Images
17 of 35
PDF Dream
Examining
Document structure
Annotation dictionary specication
PDF Ref, Section 8.4 Annotations
Provides names of keys
Modify annotations
Flags with annotation attributes
invisible - 32
Add custom data
Acrobat can repair leredundancy
Acrobat cant repair everything!
Cant do this with Touchup tool
18 of 35
PDF Dream
Page contents
Visible part of page
Represents any documents appearance
Appearance created with set of operators
that make marks on page
Descended from PostScript
Types of marks on pages
Text
Paths
Images
Contents is ordered list of drawing
operations
Page drawn in order of list
19 of 35
PDF Dream
Paint characteristics
Pages marked with paint
Objects can hide objects below them
PDF 1.4 transparencycombine layers
Text is font basedvector graphics
Text and lines are not bitmaps/images
Resolution independent
Imagesbitmaps
Page marking operators like
PostScripts and Illustrators
Illustrator 9.0+ native format is PDF 1.4
20 of 35
PDF Dream
Operator format
Prex notation
<operands> operator
operator typically 1 or 2 letters
operands typically numbers, strings
Types of operators
Set painting color
Draw pathslines and curves
Draw text
Draw images
73 drawing operators
21 of 35
PDF Dream
Color
Specify color space and coordinates
Separate colors for path stroke and ll
Basic color spaces
Gray, RGB, CMYK
Can be Device or Calibrated
Other colorspaces
ICC Based
Pattern
22 of 35
PDF Dream
Paths
Draw lines and curves
Can stroke and/or ll path
PDF Dream
Text
Text is characters in a font
Text outlines can be stroked/lled
Example: Text line
Text PDF sample in 12 point Times
12 /F1 Tf (PDF sample) Tj
24 of 35
PDF Dream
Text ordering
Text doesnt need to be in any order
1
PDF is a file format used to represent a document in a manner
independent of the application software, hardware, and operating
system used to create it. A5PDF file contains a3PDF document and
other supporting data.
4 PDF document contains one or more pages. Each page in the
A
7
document may contain any6combination
of text, graphics, and
images in a device- and resolution-independent format. This is the
page description. A2PDF document may also contain information
possible only in an electronic representation, such as hypertext links,
sound, and movies.
PDF Dream
Resources
Additional information needed to draw
page
Referenced by page contents
Font
Color space
Images
26 of 35
PDF Dream
Font
Font is glyph description plus encoding
Glyphs
Actual character or ligature shapes
abcdefghijklmnopqrstuvwxyz
ZapfDingbats
Symbol
Woodtype
Parisian
PDF Dream
28 of 35
PDF Dream
Acrobat Interactive
Forms
Catalog object has AcroForm dictionary
Fields list of root elds
NeedAppearances create appearances
for elds without one
CO calculation order of elds
29 of 35
PDF Dream
Field dictionary
Each form eld is a dictionary
Field can have Kids
Field identied by a name
Fully qualied name from ancestors
applicant.address.city
30 of 35
PDF Dream
31 of 35
PDF Dream
Easily examining
PDF internals
Enfocus Browser plug-in
Shows Cos object structure
Mac and Windows
Look at le
32 of 35
PDF Dream
PDF Dream
Summary
Three structure levels
File, Document, Cos object
Pages have contents and resources
referenced in one place
Marking operators draw page
Dictionaries are everywhere
Can examine and alter PDF le without
Acrobat
Acrobat form elds are dictionaries
PDF Ref tells you dictionary keys and
structure of document elements
34 of 35
PDF Dream
35 of 35
PDF Dream