PDFlib 8 Beta Tutorial
PDFlib 8 Beta Tutorial
Tutorial
Edition for Cobol, C, C++, Java, Perl, PHP, Python, RPG, Ruby, and Tcl
Copyright 19972009 PDFlib GmbH and Thomas Merz. All rights reserved. PDFlib users are granted permission to reproduce printed or digital copies of this manual for internal use. PDFlib GmbH Franziska-Bilek-Weg 9, 80339 Mnchen, Germany www.pdflib.com phone +49 89 452 33 84-0 fax +49 89 452 33 84-99 If you have questions check the PDFlib mailing list and archive at tech.groups.yahoo.com/group/pdflib Licensing contact: [email protected] Support for commercial PDFlib licensees: [email protected] (please include your license number) This publication and the information herein is furnished as is, is subject to change without notice, and should not be construed as a commitment by PDFlib GmbH. PDFlib GmbH assumes no responsibility or liability for any errors or inaccuracies, makes no warranty of any kind (express, implied or statutory) with respect to this publication, and expressly disclaims any and all warranties of merchantability, fitness for particular purposes and noninfringement of third party rights. PDFlib and the PDFlib logo are registered trademarks of PDFlib GmbH. PDFlib licensees are granted the right to use the PDFlib name and logo in their product documentation. However, this is not required. Adobe, Acrobat, PostScript, and XMP are trademarks of Adobe Systems Inc. AIX, IBM, OS/390, WebSphere, iSeries, and zSeries are trademarks of International Business Machines Corporation. ActiveX, Microsoft, OpenType, and Windows are trademarks of Microsoft Corporation. Apple, Macintosh and TrueType are trademarks of Apple Computer, Inc. Unicode and the Unicode logo are trademarks of Unicode, Inc. Unix is a trademark of The Open Group. Java and Solaris are trademarks of Sun Microsystems, Inc. HKS is a registered trademark of the HKS brand association: Hostmann-Steinberg, K+E Printing Inks, Schmincke. Other company product and service names may be trademarks or service marks of others. PANTONE colors displayed in the software application or in the user documentation may not match PANTONE-identified standards. Consult current PANTONE Color Publications for accurate color. PANTONE and other Pantone, Inc. trademarks are the property of Pantone, Inc. Pantone, Inc., 2003. Pantone, Inc. is the copyright owner of color data and/or software which are licensed to PDFlib GmbH to distribute for use only in combination with PDFlib Software. PANTONE Color Data and/or Software shall not be copied onto another disk or into memory unless as part of the execution of PDFlib Software. PDFlib contains modified parts of the following third-party software: ICClib, Copyright 1997-2002 Graeme W. Gill GIF image decoder, Copyright 1990-1994 David Koblas PNG image reference library (libpng), Copyright 1998-2004 Glenn Randers-Pehrson Zlib compression library, Copyright 1995-2002 Jean-loup Gailly and Mark Adler TIFFlib image library, Copyright 1988-1997 Sam Leffler, Copyright 1991-1997 Silicon Graphics, Inc. Cryptographic software written by Eric Young, Copyright 1995-1998 Eric Young ([email protected]) Independent JPEG Groups JPEG software, Copyright 1991-1998, Thomas G. Lane Cryptographic software, Copyright 1998-2002 The OpenSSL Project (www.openssl.org) Expat XML parser, Copyright 1998, 1999, 2000 Thai Open Source Software Center Ltd ICU International Components for Unicode, Copyright 1995-2009 International Business Machines Corporation and others PDFlib contains the RSA Security, Inc. MD5 message digest algorithm.
Contents
0 Applying the PDFlib License Key 1 Introduction
13 1.1 Roadmap to Documentation and Samples 13 1.2 PDFlib Programming 15 1.3 Whats new in PDFlib 8? 17 1.3.1 PDF Features for Acrobat 9 17 1.3.2 Font Handling and Text Output 17 1.3.3 PDFlib Block Plugin and PDFlib Personalization Server (PPS) 18 1.3.4 Other important Features 19 1.4 Features in PDFlib/PDFlib+PDI/PPS 8 21 1.5 Availability of Features in different Products 24 9
25
3 PDFlib Programming
49
3.1 General Programming 49 3.1.1 Exception Handling 49 3.1.2 The PDFlib Virtual File System (PVF) 51 3.1.3 Resource Configuration and File Searching 52 3.1.4 Generating PDF Documents in Memory 56 3.1.5 Large File Support 57 3.1.6 Using PDFlib on EBCDIC-based Platforms 57 3.2 Page Descriptions 59 3.2.1 Coordinate Systems 59
Contents
Page Size 61 Direct Paths and Path Objects 62 Templates 64 Referenced Pages from external PDF Documents 65
3.3 Working with Color 66 3.3.1 Patterns and Smooth Shadings 66 3.3.2 Spot Colors 67 3.3.3 Color Management and ICC Profiles 69 3.4 Interactive Elements 73 3.4.1 Links, Bookmarks, and Annotations 73 3.4.2 Form Fields and JavaScript 75 3.5 Geospatial PDF 79 3.5.1 Using GeoPDF in Acrobat 79 3.5.2 Geographic and projected Coordinate Systems 79 3.5.3 Coordinate System Examples 80 3.5.4 GeoPDF restrictions in Acrobat 9 81
83
4.3 Unicode and String Handling in PDFlib 86 4.3.1 String Types in PDFlib 86 4.3.2 Strings in Unicode-aware Language Bindings 86 4.3.3 Strings in non-Unicode-aware Language Bindings 87 4.3.4 Unicode-compatible Fonts 90 4.4 8-Bit Encodings 91 4.5 Addressing Characters and Glyphs 95 4.5.1 Escape Sequences 95 4.5.2 Character References and Glyph Name References 96 4.5.3 Glyph ID Addressing 98 4.6 Chinese, Japanese, and Korean Encodings 100
5 Font Handling
103
5.1 Overview of Fonts and Encodings 103 5.1.1 Supported Font Formats 103 5.1.2 Font Encodings 104 5.2 Font Format Details 106 5.2.1 PostScript Type 1 Fonts 106 5.2.2 TrueType and OpenType Fonts 107 5.2.3 SING (Gaiji) Fonts 107 5.2.4 User-Defined (Type 3) Fonts 108 5.3 Locating, Embedding and Subsetting Fonts 110 5.3.1 Searching for Fonts 110 5.3.2 Host Fonts on Windows and Mac 112 5.3.3 Font Embedding 115
Contents
5.3.4 Font Subsetting 116 5.4 Fallback Fonts 118 5.4.1 Purpose of Fallback Fonts 118 5.4.2 Fallback Fonts in common Situations 119 5.5 Fonts and Glyphs 121 5.5.1 Glyph Checking and Substitution 121 5.5.2 The Euro Glyph 122 5.5.3 Symbol Fonts and Font-specific Encodings 123 5.6 Querying Encodings and Fonts 124 5.6.1 Font-independent Encoding, Unicode, and Glyph Name Queries 124 5.6.2 Font-specific Encoding, Unicode, and Glyph Name Queries 125 5.6.3 Querying Codepage Coverage and Fallback Fonts 126
6 Text Output
129
6.1 Text Output Methods 129 6.2 Font Metrics and Text Variations 130 6.2.1 Font and Glyph Metrics 130 6.2.2 Kerning 131 6.2.3 Text Variations 132 6.3 Complex Script Output 134 6.3.1 Complex Scripts 134 6.3.2 Shaping 136 6.3.3 Bidirectional Formatting 137 6.3.4 Arabic Text Formatting 138 6.3.5 Advanced Line Breaking 138 6.4 Chinese, Japanese, and Korean Text Output 140 6.4.1 Standard CJK Fonts 140 6.4.2 Custom CJK Fonts 141 6.4.3 EUDC and SING Fonts for Gaiji Characters 143 6.4.4 OpenType Features for improved CJK Text Output 144 6.5 OpenType Layout Features 145 6.5.1 Supported OpenType Features 145 6.5.2 OpenType Features in PDFlib 147
151
7.2 Importing PDF Pages with PDI (PDF Import Library) 160 7.2.1 PDI Features and Applications 160 7.2.2 Using PDI Functions with PDFlib 160 7.2.3 Acceptable PDF Documents 162
Contents
7.3 Placing Images and imported PDF Pages 164 7.3.1 Simple Object Placement 164 7.3.2 Positioning an Object in a Box 164 7.3.3 Fitting an Object into a Box 165 7.3.4 Orientating an Object 166 7.3.5 Rotating an Object 168 7.3.6 Adjusting the Page Size 169 7.3.7 Querying Information about placed Images and PDF Pages 170
171
8.2 Multi-Line Textflows 179 8.2.1 Placing Textflows in the Fitbox 180 8.2.2 Paragraph Formatting Options 182 8.2.3 Inline Option Lists and Macros 182 8.2.4 Tab Stops 185 8.2.5 Numbered Lists and Paragraph Spacing 186 8.2.6 Control Characters, Character Mapping, and Symbol Fonts 187 8.2.7 Hyphenation 190 8.2.8 Controlling the Linebreak Algorithm 191 8.2.9 Wrapping Text around Paths and Images 194 8.3 Table Formatting 199 8.3.1 Placing a Simple Table 200 8.3.2 Contents of a Table Cell 203 8.3.3 Table and Column Widths 205 8.3.4 Mixed Table Contents 206 8.3.5 Table Instances 209 8.3.6 Table Formatting Algorithm 211 8.4 Matchboxes 214 8.4.1 Decorating a Textline 214 8.4.2 Using Matchboxes in a Textflow 215 8.4.3 Matchboxes and Images 216
219
9.2 Handling Basic PDF Data Types 221 9.3 Composite Data Structures and IDs 223 9.4 Path Syntax 224
Contents
235
10.2 Encrypted PDF 238 10.2.1 Strengths and Weaknesses of PDF Security 238 10.2.2 Protecting Documents with PDFlib 239 10.3 Web-Optimized (Linearized) PDF 241 10.4 PDF/X for Print Production 242 10.4.1 The PDF/X Family of Standards 242 10.4.2 Generating PDF/X-conforming Output 243 10.4.3 Importing PDF/X Documents with PDI 247 10.5 PDF/A for Archiving 249 10.5.1 The PDF/A Standards 249 10.5.2 Generating PDF/A-conforming Output 249 10.5.3 Importing PDF/A Documents with PDI 253 10.5.4 Color Strategies for creating PDF/A 254 10.5.5 XMP Document Metadata for PDF/A 255 10.5.6 PDF/A Validation 257 10.5.7 Viewing PDF/A Documents in Acrobat 257 10.6 Tagged PDF 258 10.6.1 Generating Tagged PDF with PDFlib 258 10.6.2 Creating Tagged PDF with direct Text Output and Textflows 260 10.6.3 Activating Items for complex Layouts 261 10.6.4 Using Tagged PDF in Acrobat 264
267
11.2 Overview of the PDFlib Block Concept 269 11.2.1 Separation of Document Design and Program Code 269 11.2.2 Block Properties 269 11.2.3 Why not use PDF Form Fields? 270 11.3 Creating Blocks with the PDFlib Block Plugin 272 11.3.1 Creating Blocks 272 11.3.2 Editing Block Properties 275 11.3.3 Copying Blocks between Pages and Documents 276 11.3.4 Converting PDF Form Fields to PDFlib Blocks 277 11.4 Previewing PDFlib Blocks in Acrobat 281 11.5 Filling PDFlib Blocks with PPS 285 11.6 Block Properties 288 11.6.1 Administrative Properties 288 11.6.2 Rectangle Properties 289 11.6.3 Appearance Properties 290
Contents
Text Preparation Properties 292 Text Formatting Properties 293 Object Fitting Properties 296 Properties for default Contents 299 Custom Properties 299
11.7 Querying Block Names and Properties with pCOS 300 11.8 PDFlib Block Specification 302 11.8.1 PDF Object Structure for PDFlib Blocks 302 11.8.2 Block Dictionary Keys 304 11.8.3 Generating PDFlib Blocks with pdfmarks 305
307
Contents
> In Perl:
PDF_set_parameter($p, "license", "...your license key...")
> In RPG:
c c callp RPDF_set_parameter(p:%ucs2('license'): %ucs2('...your license key...'))
> In Tcl:
PDF_set_parameter $p, "license", "...your license key..."
Working with a license file. As an alternative to supplying the license key with a runtime call, you can enter the license key in a text file according to the following format
(you can use the license file template licensekeys.txt which is contained in all PDFlib distributions):
PDFlib license file 1.0 # Licensing information for PDFlib GmbH products PDFlib 8.0.0beta1...your license key...
The license file may contain license keys for multiple PDFlib GmbH products on separate lines. It may also contain license keys for multiple platforms so that the same license file can be shared among platforms. Next, you must inform PDFlib about the license file, either by setting the licensefile parameter immediately after instantiating the PDFlib object (i.e., after PDF_new( ) or equivalent call) with a function call similar to the following: > In C and Python:
PDF_set_parameter(p, "licensefile", "/path/to/licensekeys.txt")
> In Tcl:
PDF_set_parameter $p, "licensefile", "/path/to/licensekeys.txt"
Alternatively, you can set the environment variable PDFLIBLICENSEFILE to point to your license file. On Windows use the system control panel and choose System, Advanced, Environment Variables; on Unix apply a command similar to the following:
export PDFLIBLICENSEFILE=/path/to/licensekeys.txt
On Windows you can also enter the name of the license file in the following registry key:
HKLM\Software\PDFlib\PDFLIBLICENSEFILE
Note Be careful when manually accessing the registry on 64-bit Windows systems: as usual, 64-bit PDFlib binaries will work with the 64-bit view of the Windows registry, while 32-bit PDFlib binaries running on a 64-bit system will work with the 32-bit view of the registry. If you must add registry keys for a 32-bit product manually, make sure to use the 32-bit version of the regedit tool. It can be invoked as follows from the Start, Run... dialog:
%systemroot%\syswow64\regedit
Default locations for license and resource files on Unix systems. On Unix, Linux and Mac OS X systems some directories will be searched by default for license and resource files even without specifying any path and directory names. Before searching and reading the UPR file, the following directories will be searched (in this order):
<rootpath>/PDFlib/PDFlib/8.0/resource/icc <rootpath>/PDFlib/PDFlib/8.0/resource/fonts <rootpath>/PDFlib/PDFlib/8.0/resource/cmap <rootpath>/PDFlib/PDFlib/8.0
10
<rootpath>/PDFlib/PDFlib <rootpath>/PDFlib
where <roothpath> will first be replaced with /usr/local and then with the HOME directory. This feature can be used to work with a license file, UPR file, or resources without setting any environment variables or runtime parameters. Multi-system license files on iSeries and zSeries. License keys for iSeries and zSeries are system-specific and therefore cannot be shared among multiple systems. In order to facilitate resource sharing and work with a single license file which can be shared by multiple systems, the following license file format can be used to hold multiple systemspecific keys in a single file:
PDFlib license file 2.0 # Licensing information for PDFlib GmbH products PDFlib 8.0.0beta1...your license key... ...CPU ID1... PDFlib 8.0.0beta1...your license key... ...CPU ID2...
Note the changed version number in the first line and the presence of multiple license keys, followed by the corresponding CPU ID. Working with license files on iSeries. On iSeries systems the license file must be encoded in ASCII (see asciifile parameter). The following command sets the PDFLIBLICENSEFILE environment variable to point to a suitable license file:
ADDENVVAR ENVVAR(PDFLIBLICENSEFILE) VALUE('/PDFLIB/8.0.0beta1/licensefile.txt') LEVEL(*SYS)
Adding the license key to the Windows registry. The Windows installer will add the supplied license key to the registry. Instead of using the installer you can also add the license key to the registry manually at the following registry location:
HKEY_LOCAL_MACHINE\SOFTWARE\PDFlib\PDFlib\8.0.0beta1
Updates and Upgrades. If you purchased an update (change from an older version of a product to a newer version of the same product) or upgrade (change from PDFlib to PDFlib+PDI or PPS, or from PDFlib+PDI to PPS) you must apply the new license key that you received for your update or upgrade. The old license key for the previous product must no longer be used. Note that license keys will work for all maintenance releases of a particular product version; as far as licensing is concerned, all versions 8.0.x are treated the same. Evaluating features which are not yet licensed. You can fully evaluate all features by using the software without any license key applied. However, once you applied a valid license key for a particular product using features of a higher category will no longer be available. For example, if you installed a valid PDFlib license key the PDI functionality will no longer be available for testing. Similarly, after installing a PDFlib+PDI license key the personalization features (block functions) will no longer be available. When a license key for a product has already been installed, you can replace it with the dummy license string "0" (zero) to enable functionality of a higher product class for evaluation. This will enable the previously disabled functions, and re-activate the demo stamp across all pages.
11
Licensing options. Different licensing options are available for PDFlib use on one or more servers, and for redistributing PDFlib with your own products. We also offer support and source code contracts. Licensing details and the PDFlib purchase order form can be found in the PDFlib distribution. Please contact us if you are interested in obtaining a commercial PDFlib license, or have any questions: PDFlib GmbH, Licensing Department Franziska-Bilek-Weg 9, 80339 Mnchen, Germany www.pdflib.com phone +49 89 452 33 84-0 fax +49 89 452 33 84-99 Licensing contact: [email protected] Support for PDFlib licensees: [email protected]
12
1 Introduction
1.1 Roadmap to Documentation and Samples
We provide the following material to assist you in using PDFlib products successfully: > The mini samples (hello, image, pdfclock, etc.) are available in all packages and for all language bindings. They provide minimalistic sample code for text output, images, and vector graphics. The mini samples are mainly useful for testing your PDFlib installation, and for getting a very quick overview of writing PDFlib applications. > The starter samples are contained in all packages and are available for a variety of language bindings. They provide a useful generic starting point for important topics, and cover simple text and image output, Textflow and table formatting, PDF/A and PDF/X creation and other topics. The starter samples demonstrate basic techniques for achieving a particular goal with PDFlib products. It is strongly recommended to take a look at the starter samples. > The PDFlib Tutorial (this manual), which is contained in all packages as a single PDF document, explains important programming concepts in more detail, including small pieces of sample code. If you start extending your code beyond the starter samples you should read up on relevant topics in the PDFlib Tutorial. Note Most examples in this PDFlib Tutorial are provided in the Java language (except for the language-specific samples in Chapter 2, PDFlib Language Bindings, page 25, and a few Cspecific samples which are marked as such). Although syntax details vary with each language, the basic concepts of PDFlib programming are the same for all language bindings. > The PDFlib Reference, which is contained in all packages as a single PDF document, contains a concise description of all functions, parameters, and options which together comprise the PDFlib application programming interface (API). The PDFlib Reference is the definitive source for looking up parameter details, supported options, input conditions, and other programming rules which must be observed. Note that some other reference documents are incomplete, e.g. the Javadoc API listing for PDFlib and the PDFlib function listing on php.net. Make sure to always use the full PDFlib Reference when working with PDFlib. > The PDFlib Cookbook is a collection of PDFlib coding fragments for solving specific problems. Most Cookbook examples are written in the Java language, but can easily be adjusted to other programming languages since the PDFlib API is almost identical for all supported language bindings. The PDFlib Cookbook is maintained as a growing list of sample programs. It is available at the following URL:
www.pdflib.com/pdflib-cookbook/
> The pCOS Cookbook is a collection of code fragments for the pCOS interface which is contained in PDFlib+PDI and PPS. It is available at the following URL:
www.pdflib.com/pcos-cookbook/
> PDFlib TET (Text Extraction Toolkit) is a separate product for extracting text and images from PDF documents. It can be combined with PDFlib+PDI to process PDF documents based on their contents. The TET Cookbook is a collection of code fragments for TET. It contains a group of samples which demonstrate the combination of TET and
13
PDFlib+PDI, e.g. add Web links or bookmarks based on the text on the page, highlight search terms, split documents based on text, create a table of contents, etc. The TET Cookbook is available at the following URL:
www.pdflib.com/tet-cookbook/
14
Chapter 1: Introduction
15
> PDFlib can be integrated directly in the application generating the data. > As an implication of this straightforward process, PDFlib is the fastest PDF-generating method, making it perfectly suited for the Web. > PDFlibs thread-safety as well as its robust memory and error handling support the implementation of high-performance server applications. > PDFlib is available for a variety of operating systems and development environments. Requirements for using PDFlib. PDFlib makes PDF generation possible without wading through the PDF specification. While PDFlib tries to hide technical PDF details from the user, a general understanding of PDF is useful. In order to make the best use of PDFlib, application programmers should ideally be familiar with the basic graphics model of PostScript (and therefore PDF). However, a reasonably experienced application programmer who has dealt with any graphics API for screen display or printing shouldnt have much trouble adapting to the PDFlib API.
16
Chapter 1: Introduction
17
Fallback fonts. Fallback fonts are a powerful mechanism for dealing with a variety of font and encoding-related restrictions. You can mix and match fonts, pull missing glyphs from another font, extend encodings, etc. Fallback fonts can adjust the size of individual glyphs automatically to account for design differences in the combined fonts. OpenType layout features. OpenType layout features add intelligence to a TrueType or OpenType font in the form of additional tables in the font file. These tables describe advanced typographic features such as ligatures, small capitals, swash characters, etc. They also support advanced CJK text output with halfwidth, fullwidth, and proportional glyphs, alternate forms, and many others. Retain fonts across documents. Fonts and associated data can be kept in memory after the generated document is finished. This improves performance since the font doesnt have to be parsed again for the next document, while still doing document-specific processing such as font subsetting. SING fonts for CJK Gaiji characters. The Japanese term Gaiji refers to custom characters (e.g. family or place names) which are in common use, but are not included in any encoding standard. Adobes SING font architecture (glyphlets) solves the Gaiji problem for CJK text. PDFlib supports SING fonts as well as the related Microsoft concept of EUDC fonts (end-user defined fonts). Using the fallback font feature SING and EUDC fonts can be blended into an existing font. Redesigned font engine. PDFlibs font engine has been redesigned and streamlined, resulting in a variety of Unicode and encoding-related advantages as well as a general performance speedup and reduced memory requirements. Due to the redesign some restrictions could be eliminated and the functionality of existing features extended. For example, it is now possible to address more than 256 glyphs in Type 1 or Type 3 fonts. Wrap text around image clipping paths. The Textflow formatting engine wraps text around arbitrary paths and can also use the clipping path of an imported TIFF or JPEG image. This way multi-line text can be wrapped around an image. Text on a path. Text can be placed on arbitrary vector paths consisting of an arbitrary mixture of straight line segments, curves, and arcs. The paths can be constructed programmatically. Alternatively, the clipping paths from TIFF or JPEG images can be extracted and used as text paths.
18
Chapter 1: Introduction
Clone PDF/A or PDF/X status of the Block container. When generating Block previews based on PDF/A or PDF/X documents, the Block Plugin can clone all relevant aspects of the standard, e.g. standard identification, output intent, and metadata. If a Block filling operation in PDF/A or PDF/X cloning mode would violate the selected standard (e.g. because a default image uses RGB color space although the document does not contain a suitable output intent) an error message will be displayed. This way users can catch potential standard violations very early in the workflow. Redesigned user interface and snap-to-grid. The user interface of the PDFlib Block Plugin has been restructured to facilitate access to the large number of existing and new Block properties. The new snap-to-grid feature is useful for quickly laying out Blocks according to a design raster. Additional Block properties. More Block properties have been added to the Block Plugin and PPS, e.g. for specifying transparency of text, image, or PDF contents placed in a Block. Leverage PDFlib 8 features with Blocks. Relevant new features of PDFlib 8 such as text output for complex scripts and OpenType layout features can be activated directly with Block properties. For example, Blocks can be filled with Arabic or Hindi text.
19
Improvements in existing functions. The list below mentions some of the most important improvements of existing features in PDFlib 8: > query image details with PDF_begin_template_ext( ) > PPS and Block Plugin: additional Block properties which make new PDFlib features accessible via PDFlib Blocks > Unicode filenames on Unix systems > Table formatter: place path objects, annotations, and form fields in table cells > Textflow: additional formatting control options, advanced language-specific linebreaking > shadow text > retain XMP metadata in imported images > many improvements in PDF_info_font( ) > additional options for creating annotations > Configurable string data type for the C++ binding, e.g. wstring for Unicode support There are many more new features; see the PDFlib Reference for details.
20
Chapter 1: Introduction
21
Table 1.1 Feature list for PDFlib, PDFlib+PDI, and the PDFlib Personalization Server (PPS) topic features Fallback fonts (pull missing glyphs from an auxiliary font)1 Retain fonts across documents to increase performance1 Text output Text output in different fonts; underlined, overlined, and strikeout text Glyphs in a font can be addressed by numerical value, Unicode value, or glyph name1 Kerning for improved character spacing Artificial bold, italic, and shadow1 text Create text on a path1 Proportional widths for standard CJK fonts1 Configurable replacement of missing glyphs Internationalization Unicode strings for page content, interactive elements, and file names1; UTF-8, UTF-16, and UTF32 formats Support for a variety of 8-bit and legacy multi-byte CJK encodings (e.g. SJIS; Big5) Fetch code pages from the system (Windows, IBM eServer iSeries and zSeries) Standard and custom CJK fonts and CMaps for Chinese, Japanese, and Korean text Character shaping for complex scripts, e.g. Arabic, Thai, Devanagari1 Bidirectional text formatting for right-to-left scripts, e.g. Arabic and Hebrew1 Embed Unicode information in PDF for proper text extraction in Acrobat Images Embed BMP, GIF, PNG, TIFF, JBIG21, JPEG, JPEG20001, and CCITT raster images Automatic detection of image file formats Query image information (pixel size, resolution, ICC profile, clipping path, etc.)1 Interpret clipping paths in TIFF and JPEG images Interpret alpha channel (transparency) in TIFF and PNG images1 Image masks (transparent images with a color applied), colorize images with a spot color Color Grayscale, RGB (numerical, hexadecimal strings, HTML color names), CMYK, CIE L*a*b* color Integrated support for PANTONE colors (incl. PANTONE Goe)1 and HKS colors User-defined spot colors Color management ICC-based color with ICC profiles; support for ICC 41 profiles Rendering intent for text, graphics, and raster images Default gray, RGB, and CMYK color spaces to remap device-dependent colors ICC profiles as output intent for PDF/A and PDF/X Archiving Graphic arts PDF/A-1a and PDF/A-1b (ISO 19005-1) XMP extension schemas for PDF/A-1 PDF/X-1a, PDF/X-3, PDF/X-41, PDF/X-4p1, PDF/X-5p1, PDF/X-5pg1 Embedded or externally referenced1 output intent ICC profile External graphical content (referenced pages) for PDF/X-5p and PDF/X-5pg1 Copy output intent from imported PDF documents (only PDFlib+PDI and PPS) Create OPI 1.3 and OPI 2.0 information for imported images Separation information (PlateColor) Settings for text knockout, overprinting etc. Textflow Formatting Format text into one or more rectangular or arbitrarily shaped areas with hyphenation (usersupplied hyphenation points required), font and color changes, justification methods, tabs, leaders, control commands; wrap text around images Advanced line-breaking with language-specific processing
22
Chapter 1: Introduction
Table 1.1 Feature list for PDFlib, PDFlib+PDI, and the PDFlib Personalization Server (PPS) topic features Flexible image placement and formatting Wrap text around images or image clipping paths1 Table formatting Table formatter places rows and columns and automatically calculates their sizes according to a variety of user preferences. Tables can be split across multiple pages. Table cells can hold single- or multi-line text, images, PDF pages, path objects, annotations, and form fields Table cells can be formatted with ruling and shading options Flexible stamping function Matchbox concept for referencing the coordinates of placed images or other objects Security Encrypt PDF output with RC4 (40/128 bit) or AES encryption algorithms (128/2561 bit) Unicode passwords1 Specify permission settings (e.g. printing or copying not allowed) Import encrypted documents (master password required; only PDFlib+PDI and PPS) Interactive elements Create form fields with all field options and JavaScript Create actions for bookmarks, annotations, page open/close and other events Create bookmarks with a variety of options and controls Page transition effects, such as shades and mosaic Create all PDF annotation types, such as PDF links, launch links (other document types), Web links Named destinations for links, bookmarks, and document open action Create page labels (symbolic names for pages) Multimedia GeoPDF Tagged PDF Embed 3D animations in U3D format Create PDF with geospatial reference information1 Create Tagged PDF and structure information for accessibility, page reflow, and improved content repurposing; links and other annotations can be integrated in the document structure Easily format large amounts of text for Tagged PDF Metadata Document information: standard fields (Title, Subject, Author, Keywords) and user-defined fields Create XMP metadata from document info fields or from client-supplied XMP streams Process XMP image metadata in TIFF, JPEG, and JPEG2000 images Programming Language bindings for Cobol, COM, C, C++1, Java, .NET, Perl, PHP, Python, REALbasic, RPG, Ruby, Tcl Virtual file system for supplying data in memory, e.g., images from a database
1. New or considerably improved in PDFlib/PDFlib+PDI/PPS 8
23
PDFlib
feature basic PDF generation linearized (Web-optimized) PDF optimize PDF (only relevant for inefficient client code and non-optimized imported PDF documents) Referenced PDF, PDF/X-5g and PDF/X-5pg Parsing PDF documents for Portfolio creation PDF import (PDI) Query information from PDF with pCOS Variable data processing and personalization with Blocks PDFlib Block plugin for Acrobat
API functions and options all except those listed below linearize option in PDF_end_document( ) optimize option in PDF_end_document( )
X X
X X X
X X
reference option in PDF_begin_template_ext( ) and PDF_open_pdi_page( ) password option in PDF_add_portfolio_file( ) all PDI functions all pCOS functions all PPS functions for Block filling interactively create PDFlib blocks for use with PPS
X1 X1
X X X X
X X X X X X
1. Not available with source code licenses for PDFlib since PDI is required internally for this feature
24
Chapter 1: Introduction
PPS
const char *
const char * STRING byte[ ] string string string data string byte array
1. C language NULL string values and empty strings are considered equivalent. 2. The C++ API can be customized via instantiation of the std::basic_string template. For example, the API can be switched to std::string to achieve compatibility with older applications. Alternatively, user-defined data types can also be used as the basis of the string type used in the API (see Section 2.5, C++ Binding, page 31). 3. Cobol programs must use abbreviated names for the PDFlib functions.
25
All Cobol strings passed to the PDFlib API should be defined with one extra byte of storage for the expected LOW-VALUES (NULL) terminator. Return values. The return value of PDFlib API functions will be supplied in an additional ret parameter which is passed by reference. It will be filled with the result of the respective function call. A zero return value means the function call executed just fine; other values signal an error, and PDF generation cannot be continued. Functions which do not return any result (C functions with a void return type) dont use this additional parameter. Error handling. PDFlib exception handling is not available in the Cobol language binding. Instead, all API functions support an additional return code (rc) parameter which signals errors. The rc parameter is passed by reference, and will be used to report problems. A non-zero value indicates that the function call failed.
26
27
2.4 C Binding
PDFlib itself is written in the ANSI C language. In order to use the PDFlib C binding, you can use a static or shared library (DLL on Windows and MVS), and you need the central PDFlib include file pdflib.h for inclusion in your PDFlib client source modules. Alternatively, pdflibdl.h can be used for dynamically loading the PDFlib DLL at runtime (see next section for details). Using PDFlib as a DLL loaded at runtime. While most clients will use PDFlib as a statically bound library or a dynamic library which is bound at link time, you can also load the PDFlib DLL at runtime and dynamically fetch pointers to all API functions. This is especially useful to load the PDFlib DLL only on demand, and on MVS where the library is customarily loaded as a DLL at runtime without explicitly linking against PDFlib. PDFlib supports a special mechanism to facilitate this dynamic usage. It works according to the following rules: > Include pdflibdl.h instead of pdflib.h. > Use PDF_new_dl( ) and PDF_delete_dl( ) instead of PDF_new( ) and PDF_delete( ). > Use PDF_TRY_DL( ) and PDF_CATCH_DL( ) instead of PDF_TRY( ) and PDF_CATCH( ). > Use function pointers for all other PDFlib calls. > PDF_get_opaque( ) must not be used. > Compile the auxiliary module pdflibdl.c and link your application against it. Note Loading the PDFlib DLL at runtime is supported on selected platforms only. Error handling in C. PDFlib supports structured exception handling with try/catch clauses. This allows C and C++ clients to catch exceptions which are thrown by PDFlib, and react on the exception in an adequate way. In the catch clause the client will have access to a string describing the exact nature of the problem, a unique exception number, and the name of the PDFlib API function which threw the exception. The general structure of a PDFlib C client program with exception handling looks as follows:
PDF_TRY(p) { ...some PDFlib instructions... } PDF_CATCH(p) { printf("PDFlib exception occurred in hello sample:\n"); printf("[%d] %s: %s\n", PDF_get_errnum(p), PDF_get_apiname(p), PDF_get_errmsg(p)); PDF_delete(p); return(2); } PDF_delete(p);
PDF_TRY/PDF_CATCH are implemented as tricky preprocessor macros. Accidentally omitting one of these will result in compiler error messages which may be difficult to comprehend. Make sure to use the macros exactly as shown above, with no additional code between the TRY and CATCH clauses (except PDF_CATCH( )). An important task of the catch clause is to clean up PDFlib internals using PDF_ delete( ) and the pointer to the PDFlib object. PDF_delete( ) will also close the output file if
28
necessary. After fatal exceptions the PDF document cannot be used, and will be left in an incomplete and inconsistent state. Obviously, the appropriate action when an exception occurs is application-specific. For C and C++ clients which do not catch exceptions, the default action upon exceptions is to issue an appropriate message on the standard error channel, and exit on fatal errors. The PDF output file will be left in an incomplete state! Since this may not be adequate for a library routine, for serious PDFlib projects it is strongly advised to leverage PDFlibs exception handling facilities. A user-defined catch clause may, for example, present the error message in a GUI dialog box, and take other measures instead of aborting. Volatile variables. Special care must be taken regarding variables that are used in both the PDF_TRY( ) and the PDF_CATCH( ) blocks. Since the compiler doesnt know about the control transfer from one block to the other, it might produce inappropriate code (e.g., register variable optimizations) in this situation. Fortunately, there is a simple rule to avoid these problems: Note Variables used in both the PDF_TRY( ) and PDF_CATCH( ) blocks should be declared volatile. Using the volatile keyword signals to the compiler that it must not apply (potentially dangerous) optimizations to the variable. Nesting try/catch blocks and rethrowing exceptions. PDF_TRY( ) blocks may be nested to an arbitrary depth. In the case of nested error handling, the inner catch block can activate the outer catch block by re-throwing the exception:
PDF_TRY(p) { /* ... */ PDF_TRY(p) { /* ... */ } PDF_CATCH(p) { /* error cleanup */ PDF_RETHROW(p); } /* ... */ } PDF_CATCH(p) { /* more error cleanup */ PDF_delete(p); } /* outer try block */
The PDF_RETHROW( ) invocation in the inner error handler will transfer program execution to the first statement of the outer PDF_CATCH( ) block immediately. Prematurely exiting a try block. If a PDF_TRY( ) block is left e.g., by means of a return statement , thus bypassing the invocation of the corresponding PDF_CATCH( ) macro, the PDF_EXIT_TRY( ) macro must be used to inform the exception machinery. No other library function must be called between this macro and the end of the try block:
2.4 C Binding
29
PDF_TRY(p) { /* ... */ if (error_condition) { PDF_EXIT_TRY(p); return -1; } } PDF_CATCH(p) { /* error cleanup */ PDF_RETHROW(p); }
Memory management in C. In order to allow for maximum flexibility, PDFlibs internal memory management routines (which are based on standard C malloc/free) can be replaced by external procedures provided by the client. These procedures will be called for all PDFlib-internal memory allocation or deallocation. Memory management routines can be installed with a call to PDF_new2( ), and will be used in lieu of PDFlibs internal routines. Either all or none of the following routines must be supplied: > an allocation routine > a deallocation (free) routine > a reallocation routine for enlarging memory blocks previously allocated with the allocation routine. The memory routines must adhere to the standard C malloc/free/realloc semantics, but may choose an arbitrary implementation. All routines will be supplied with a pointer to the calling PDFlib object. The only exception to this rule is that the very first call to the allocation routine will supply a PDF pointer of NULL. Client-provided memory allocation routines must therefore be prepared to deal with a NULL PDF pointer. Using the PDF_get_opaque( ) function, an opaque application specific pointer can be retrieved from the PDFlib object. The opaque pointer itself is supplied by the client in the PDF_new2( ) call. The opaque pointer is useful for multi-threaded applications which may want to keep a pointer to thread- or class specific data inside the PDFlib object, for use in memory management or error handling. Unicode in the C language binding. Clients of the C language binding must take care not to use the standard text output functions (PDF_show( ), PDF_show_xy( ), and PDF_ continue_text( )) when the text may contain embedded null characters. In such cases the alternate functions PDF_show2( ) etc. must be used, and the length of the string must be supplied separately. This is not a concern for all other language bindings since the PDFlib language wrappers internally call PDF_show2( ) etc. in the first place.
30
> Switch the applications string handling to wstrings. This includes data from external sources. However, string literals in the source code (including option lists!) must also be adjusted by prepending the L prefix, e.g.
const wstring imagefile = L"nesrin.jpg"; image = p.load_image(L"auto", imagefile, L"");
> Suitable wstring-capable methods (wcerr etc.) must be used to process PDFlib error messages and exception strings (get_errmsg( ) method in the PDFlib and PDFlibException classes).
31
> Remove PDFlib method calls which are required only for non-Unicode-capable languages, especially the following:
p.set_parameter("hypertextencoding", "host");
> The pdflib.cpp module is no longer required for the PDFlib C++ binding. Although the PDFlib distribution contains a dummy implementation of this module, it should be removed from the build process of PDFlib applications. Full source code compatibility with legacy applications. The new C++ binding has been designed with application-level source code compatibility mind, but client applications must be recompiled. The following aids are available to achieve full source code compatibility for legacy applications: > Disable the wstring-based interface as follows before including pdflib.hpp:
#define PDFCPP_PDFLIB_WSTRING 0
Error handling in C++. PDFlib API functions will throw a C++ exception in case of an error. These exceptions must be caught in the client code by using C++ try/catch clauses. In order to provide extended error information the PDFlib class provides a public PDFlib::Exception class which exposes methods for retrieving the detailed error message, the exception number, and the name of the PDFlib API function which threw the exception. Native C++ exceptions thrown by PDFlib routines will behave as expected. The following code fragment will catch exceptions thrown by PDFlib:
try { ...some PDFlib instructions... catch (PDFlib::Exception &ex) { wcerr << L"PDFlib exception occurred in hello sample: " << endl << L"[" << ex.get_errnum() << L"] " << ex.get_apiname() << L": " << ex.get_errmsg() << endl; }
Memory management in C++. Client-supplied memory management for the C++ binding works the same as with the C language binding. The PDFlib constructor accepts an optional error handler, optional memory management procedures, and an optional opaque pointer argument. Default NULL arguments are supplied in pdflib.hpp which will result in PDFlibs internal error and memory management routines becoming active. All memory management functions must be C functions, not C++ methods.
32
This package is available in the pdflib.jar file and contains a single class called pdflib. In order to supply this package to your application, you must add pdflib.jar to your CLASSPATH environment variable, add the option -classpath pdflib.jar in your calls to the Java compiler and runtime, or perform equivalent steps in your Java IDE. In the JDK you can configure the Java VM to search for native libraries in a given directory by setting the java.library.path property to the name of the directory, e.g.
java -Djava.library.path=. pdfclock
In addition, the following platform-dependent steps must be performed: > Unix: the library libpdf_java.so (on Mac OS X: libpdf_java.jnilib) must be placed in one of the default locations for shared libraries, or in an appropriately configured directory. > Windows: the library pdf_java.dll must be placed in the Windows system directory, or a directory which is listed in the PATH environment variable. Using PDFlib in J2EE application servers and Servlet containers. PDFlib is perfectly suited for server-side Java applications. The PDFlib distribution contains sample code and configuration for using PDFlib in J2EE environments. The following configuration issues must be observed: > The directory where the server looks for native libraries varies among vendors. Common candidate locations are system directories, directories specific to the underly-
33
ing Java VM, and local server directories. Please check the documentation supplied by the server vendor. > Application servers and Servlet containers often use a special class loader which may be restricted or uses a dedicated classpath. For some servers it is required to define a special classpath to make sure that the PDFlib package will be found. More detailed notes on using PDFlib with specific Servlet engines and application servers can be found in additional documentation in the J2EE directory of the PDFlib distribution. Error handling in Java. The Java binding installs a special error handler which translates PDFlib errors to native Java exceptions. In case of an exception PDFlib will throw a native Java exception of the following class:
PDFlibException
The Java exceptions can be dealt with by the usual try/catch technique:
try { ...some PDFlib instructions... } catch (PDFlibException e) { System.err.print("PDFlib exception occurred in hello sample:\n"); System.err.print("[" + e.get_errnum() + "] " + e.get_apiname() + ": " + e.get_errmsg() + "\n"); } catch (Exception e) { System.err.println(e.getMessage()); } finally { if (p != null) { p.delete(); } }
Since PDFlib declares appropriate throws clauses, client code must either catch all possible PDFlib exceptions, or declare those itself. Unicode and legacy encoding conversion. For the convenience of PDFlib users we list some useful string conversion methods here. Please refer to the Java documentation for more details. The following constructor creates a Unicode string from a byte array, using the platforms default encoding:
String(byte[] bytes)
The following constructor creates a Unicode string from a byte array, using the encoding supplied in the enc parameter (e.g. SJIS, UTF8, UTF-16):
String(byte[] bytes, String enc)
The following method of the String class converts a Unicode string to a string according to the encoding specified in the enc parameter:
byte[] getBytes(String enc)
34
Javadoc documentation for PDFlib. The PDFlib package contains Javadoc documentation for PDFlib. The Javadoc contains only abbreviated descriptions of all PDFlib API methods; please refer to the PDFlib Reference for more details. In order to configure Javadoc for PDFlib in Eclipse proceed as follows: > In the Package Explorer right-click on the Java project and select Javadoc Location. > Click on Browse... and select the path where the Javadoc (which is part of the PDFlib package) is located. After these steps you can browse the Javadoc for PDFlib, e.g. with the Java Browsing perspective or via the Help menu. Using PDFlib with Groovy. The PDFlib Java binding can also be used with the Groovy language. The API calls are identical to the Java calls; only the object instantiation is slightly different. A simple example for using PDFlib with Groovy is contained in the PDFlib distribution.
35
36
Unix. Perl will search both pdflib_pl.so (on Mac OS X: pdflib_pl.dylib) and pdflib_pl.pm in the current directory, or the directory printed by the following Perl command:
perl -e 'use Config; print $Config{sitearchexp};'
Perl will also search the subdirectory auto/pdflib_pl. Typical output of the above command looks like
/usr/lib/perl5/site_perl/5.8/i686-linux
Windows. PDFlib supports the ActiveState port of Perl 5 to Windows, also known as ActivePerl.2 Both pdflib_pl.dll and pdflib_pl.pm will be searched in the current directory, or the directory printed by the following Perl command:
perl -e "use Config; print $Config{sitearchexp};"
Error Handling in Perl. The Perl binding installs a special error handler which translates PDFlib errors to native Perl exceptions. The Perl exceptions can be dealt with by applying the appropriate language constructs, i.e., by bracketing critical sections:
eval { ...some PDFlib instructions... }; die "Exception caught" if $@;
37
More than one way of String handling. Depending on the requirements of your application you can work with UTF-8, UTF-16, or legacy encodings. The following code snippets demonstrate all three variants. All examples create the same Japanese output, but accept the string input in different formats. The first example works with Unicode UTF-8 and uses the Unicode::String module which is part of most modern Perl distributions, and available on CPAN). Since Perl works with UTF-8 internally no explicit UTF-8 conversion is required:
use Unicode::String qw(utf8 utf16 uhex); ... PDF_set_parameter($p, "textformat", "utf8"); $font = PDF_load_font($p, "Arial Unicode MS", "unicode", ""); PDF_setfont($p, $font, 24.0); PDF_set_text_pos($p, 50, 700); PDF_show($p, uhex("U+65E5 U+672C U+8A9E"));
The second example works with Unicode UTF-16 and little-endian byte order:
PDF_set_parameter($p, "textformat", "utf16le"); $font = PDF_load_font($p, "Arial Unicode MS", "unicode", ""); PDF_setfont($p, $font, 24.0); PDF_set_text_pos($p, 50, 700); PDF_show($p, "\xE5\x65\x2C\x67\x9E\x8A");
The third example works with Shift-JIS. Except on Windows systems it requires access to the 90ms-RKSJ-H CMap for string conversion:
PDF_set_parameter($p, "SearchPath", "../../../resource/cmap"); $font = PDF_load_font($p, "Arial Unicode MS", "cp932", ""); PDF_setfont($p, $font, 24.0); PDF_set_text_pos($p, 50, 700); PDF_show($p, "\x93\xFA\x96\x7B\x8C\xEA");
Unicode and legacy encoding conversion. For the convenience of PDFlib users we list some useful string conversion methods here. Please refer to the Perl documentation for more details. The following constructor creates a Unicode string from a byte array:
$logos="\x{039b}\x{03bf}\x{03b3}\x{03bf}\x{03c3}\x{0020}" ;
The following constructor creates a Unicode string from the Unicode character name:
$delta = "\N{GREEK CAPITAL LETTER DELTA}";
The Encode module supports many encodings and has interfaces for converting between those encodings:
use Encode 'decode'; $data = decode("iso-8859-3", $data); # convert from legacy to UTF-8
38
PHP will search the library in the directory specified in the extension_dir variable in php.ini on Unix, and additionally in the standard system directories on Windows. You can test which version of the PHP PDFlib binding you have installed with the following one-line PHP script:
<?phpinfo()?>
This will display a long info page about your current PHP configuration. On this page check the section titled pdf. If this section contains PDFlib GmbH Binary Version (and the PDFlib version number) you are using the supported new PDFlib wrapper. The unsupported old wrapper will display PDFlib GmbH Version instead. > Load PDFlib at runtime with one of the following lines at the start of your script:
dl("libpdf_php.so"); dl("libpdf_php.dll"); # for Unix # for Windows
Modified error return for PDFlib functions in PHP. Since PHP uses the convention of returning the value 0 (FALSE) when an error occurs within a function, all PDFlib functions have been adjusted to return 0 instead of -1 in case of an error. This difference is noted in the function descriptions in the PDFlib Reference. However, take care when reading the code fragment examples in Section 3, PDFlib Programming, page 49, since they use the usual PDFlib convention of returning -1 in case of an error. File name handling in PHP. Unqualified file names (without any path component) and relative file names for PDF, image, font and other disk files are handled differently in Unix and Windows versions of PHP: > PHP on Unix systems will find files without any path component in the directory where the script is located. > PHP on Windows will find files without any path component only in the directory where the PHP DLL is located. In order to provide platform-independent file name handling the use of PDFlibs SearchPath facility is strongly recommended (see Section 3.1.3, Resource Configuration and File Searching, page 52). Exception handling in PHP. Since PHP 5 supports structured exception handling, PDFlib exceptions will be propagated as PHP exceptions. PDFlib will throw an exception
1. See www.php.net
39
of the class PDFlibException, which is derived from PHPs standard Exception class. You can use the standard try/catch technique to deal with PDFlib exceptions:
try { ...some PDFlib instructions... } catch (PDFlibException $e) { print "PDFlib exception occurred:\n"; print "[" . $e->get_errnum() . "] " . $e->get_apiname() . ": " $e->get_errmsg() . "\n"; } catch (Exception $e) { print $e; }
Unicode and legacy encoding conversion. The iconv module can be used for string conversions. Please refer to the PHP documentation for more details. PDFlib development with Eclipse and Zend Studio. The PHP Development Tools (PDT)1 support PHP development with Eclipse and Zend Studio. PDT can be configured to support context-sensitive help with the steps outlined below. Add PDFlib to the Eclipse preferences so that it will be known to all PHP projects: > Select Window, Preferences, PHP, PHP Libraries, New... to launch a wizard. > In User library name enter PDFlib, click Add External folder... and select the folder bind\php\Eclipse PDT. In an existing or new PHP project you can add a reference to the PDFlib library as follows: > In the PHP Explorer right-click on the PHP project and select Include Path, Configure Include Path... > Go to the Libraries tab, click Add Library..., and select User Library, PDFlib. After these steps you can explore the list of PDFlib methods under the PHP Include Path/ PDFlib/PDFlib node in the PHP Explorer view. When writing new PHP code Eclipse will assist with code completion and context-sensitive help for all PDFlib methods.
1. See www.eclipse.org/pdt
40
1. See www.python.org
41
1. See www.realbasic.com
42
If the PDFlib source file library is not on top of your library list you have to specify the library as well:
d/copy PDFsrclib/QRPGLESRC,PDFLIB
Before you start compiling your ILE-RPG program you have to create a binding directory that includes the PDFLIB and PDFLIB_RPG service programs shipped with PDFlib. The following example assumes that you want to create a binding directory called PDFLIB in the library PDFLIB:
CRTBNDDIR BNDDIR(PDFLIB/PDFLIB) TEXT('PDFlib Binding Directory')
After creating the binding directory you need to add the PDFLIB and PDFLIB_RPG service programs to your binding directory. The following example assumes that you want to add the service program PDFLIB in the library PDFLIB to the binding directory created earlier.
ADDBNDDIRE BNDDIR(PDFLIB/PDFLIB) OBJ((PDFLIB/PDFLIB *SRVPGM)) ADDBNDDIRE BNDDIR(PDFLIB/PDFLIB) OBJ((PDFLIB/PDFLIB_RPG *SRVPGM))
43
Now you can compile your program using the CRTBNDRPG command (or option 14 in PDM):
CRTBNDRPG PGM(PDFLIB/HELLO) SRCFILE(PDFLIB/QRPGLESRC) SRCMBR(*PGM) DFTACTGRP(*NO) BNDDIR(PDFLIB/PDFLIB)
Error handling in RPG. PDFlib clients written in ILE-RPG can install an error handler in PDFlib which will be activated when an exception occurs. Since ILE-RPG translates all procedure names to uppercase, the name of the error handler procedure should be specified in uppercase. The following skeleton demonstrates this technique:
***************************************************************************************** d/copy QRPGLESRC,PDFLIB ***************************************************************************************** d p S * d font s 10i 0 * d error s 50 * d errhdl s * procptr * * Prototype for exception handling procedure * d errhandler PR d p * value d type 10i 0 value d shortmsg 2048 ***************************************************************************************** c clear error * * Set the procedure pointer to the ERRHANDLER procedure. * c eval errhdl=%paddr('ERRHANDLER') * c eval p=pdf_new2(errhdl:*null:*null:*null:*null) ...PDFlib instructions... callp PDF_delete(p) * c exsr exit ***************************************************************************************** c exit begsr c if error<>*blanks c error dsply c endif c seton lr c return c endsr ***************************************************************************************** * If any of the PDFlib functions will cause an exception, first the error handler * will be called and after that we will get a regular RPG exception. c *pssr begsr c exsr exit c endsr ***************************************************************************************** * Exception Handler Procedure * This procedure will be linked to PDFlib by passing the procedure pointer to c
44
* PDF_new2. This procedure will be called when a PDFlib exception occurs. * ***************************************************************************************** p errhandler B d errhandler PI d p * value d type 10i 0 value d c_message 2048 * d length s 10i 0 * * Chop off the trailing x'00' (we are called by a C program) * and set the error (global) string c clear error c x'00' scan c_message length 50 c sub 1 length c if *in50 and length>0 c if length>%size(error) c eval error=c_message c else c eval error=%subst(c_message:1:length) c endif c endif * * Always call PDF_delete to clean up PDFlib c callp PDF_delete(p) * c return * p errhandler E
45
However, Ruby will search other directories for extensions as well. In order to retrieve a list of these directories you can use the following ruby call:
ruby -e "puts $:"
This list will usually include the current directory, so for testing purposes you can simply place the PDFlib extension library and the scripts in the same directory. Error Handling in Ruby. The Ruby binding installs a special error handler which translates PDFlib exceptions to native Ruby exceptions. The Ruby exceptions can be dealt with by the usual rescue technique:
begin ...some PDFlib instructions... rescue PDFlibException => pe print "PDFlib exception occurred in hello sample:\n" print "[" + pe.get_errnum.to_s + "] " + pe.get_apiname + ": " + pe.get_errmsg + "\n"
Ruby on Rails. Ruby on Rails2 is an open-source framework which facilitates Web development with Ruby. The PDFlib extension for Ruby can be used with Ruby on Rails; examples are included in the package. Follow these steps to run the PDFlib examples for Ruby on Rails: > Install Ruby. > Install Ruby on Rails. > Unpack the PDFlib package for Ruby which contains samples for Ruby on Rails. > Change to the bind/ruby/RubyOnRails directory and start the Ruby web server:
ruby script/server
> Point your browser to http://localhost:3000. The code for the PDFlib samples can be found in app/controllers/pdflib_controller.rb. Local PDFlib installation. If you want to use PDFlib only with Ruby on Rails, but cannot install it globally for general use with Ruby, you can install PDFlib locally in the vendors directory within the Rails tree. This is particularly useful if you do not have permission to install Ruby extensions for general use, but want to work with PDFlib in Rails nevertheless.
46
Unix: the library pdflib_tcl.so (on Mac OS X: pdflib_tcl.dylib) must be placed in one of the default locations for shared libraries, or in an appropriately configured directory. Usually both pkgIndex.tcl and pdflib_tcl.so will be placed in the directory
/usr/lib/tcl8.4/pdflib
Windows: the files pkgIndex.tcl and pdflib_tcl.dll will be searched for in the directories
C:\Program Files\Tcl\lib\pdflib C:\Program Files\Tcl\lib\tcl8.3\pdflib
Error handling in Tcl. The Tcl binding installs a special error handler which translates PDFlib errors to native Tcl exceptions. The Tcl exceptions can be dealt with by the usual try/catch technique:
if [ catch { ...some PDFlib instructions... } result ] { puts stderr "Exception caught!" puts stderr $result }
1. See www.tcl.tk
47
48
3 PDFlib Programming
3.1 General Programming
Cookbook Code samples regarding general programming issues can be found in the general category of the PDFlib Cookbook.
49
Error policies. When PDFlib detects an error condition, it will react according to one of several strategies which can be configured with the errorpolicy parameter. All functions which can return error codes also support an errorpolicy option. The following error policies are supported: > errorpolicy=legacy: this deprecated setting ensures behavior which is compatible to earlier versions of PDFlib, where exceptions and error return values are controlled by parameters and options such as fontwarning, imagewarning, etc. This is only recommended for applications which require source code compatibility with PDFlib 6. It should not be used for new applications. The legacy setting is the default error policy. > errorpolicy=return: when an error condition is detected, the respective function will return with a -1 (in PHP: 0) error value regardless of any warning parameters or options. The application developer must check the return value to identify problems, and must react on the problem in whatever way is appropriate for the application. This is the recommended approach since it allows a unified approach to error handling. > errorpolicy=exception: an exception will be thrown when an error condition is detected. However, the output document will be unusable after an exception. This can be used for lazy programming without any error conditionals at the expense of sacrificing the output document even for problems which may be fixable by the application. The following code fragments demonstrate different strategies with respect to exception handling. The examples try to load a font which may or may not be available. If errorpolicy=return the return value must be checked for an error. If it indicates failure, the reason of the failure can be queried in order to properly deal with the situation:
font = p.load_font("MyFontName", "unicode", "errorpolicy=return"); if (font == -1) { /* font handle is invalid; find out what happened. */ errmsg = p.get_errmsg()); /* Try a different font or give up */ ... } /* font handle is valid; continue */
Cookbook A full code sample can be found in the Cookbook topic general/error_handling. Warnings. Some problem conditions can be detected by PDFlib internally, but do not justify interrupting the program flow by throwing an exception. While earlier versions of PDFlib supported the concept of non-fatal exceptions which can be disabled, PDFlib 7 never throws an exception for non-fatal conditions. Instead, a description of the condition will be logged (if logging is enabled). Logging can be enabled as follows:
p.set_parameter("logging", "filename=private.log");
50
We recommend the following approach with respect to warnings: > Enable warning logging in the development phase, and carefully study any warning messages in the log file. They may point to potential problems in your code or data, and you should try to understand or eliminate the reason for those warnings. > Disable warning logging in the production phase, and re-enable it only in case of problems.
51
plied), but only the corresponding data structures used for PVF file name administration. This gives rise to the following strategies: > Minimize memory usage: it is recommended to call PDF_delete_pvf( ) immediately after the API call which accepted the virtual file name, and another time after PDF_ close( ). The second call is required because PDFlib may still need access to the data so that the first call refuses to unlock the virtual file. However, in some cases the first call will already free the data, and the second call doesnt do any harm. The client may free the file contents only when PDF_delete_pvf( ) succeeded. > Optimize performance by reusing virtual files: some clients may wish to reuse some data (e.g., font definitions) within various output documents, and avoid multiple create/delete cycles for the same file contents. In this case it is recommended not to call PDF_delete_pvf( ) as long as more PDF output documents using the virtual file will be generated. > Lazy programming: if memory usage is not a concern the client may elect not to call PDF_delete_pvf( ) at all. In this case PDFlib will internally delete all pending virtual files in PDF_delete( ). In all cases the client may free the corresponding data only when PDF_delete_pvf( ) returned successfully, or after PDF_delete( ).
52
Table 3.1 Resource categories supported in PDFlib category Encoding HostFont ICCProfile StandardOutputIntent format key=value key=value key=value key=value explanation text file containing an 8-bit encoding or code page table Name of a font installed on the system name of an ICC color profile name of a standard output condition for PDF/X (in addition to those which are already built into PDFlib, see PDFlib Reference for a complete list)
Redundant resource entries should be avoided. For example, do not include multiple entries for a certain fonts metrics data. Also, the font name as configured in the UPR file should exactly match the actual font name in order to avoid confusion (although PDFlib does not enforce this restriction). The UPR file format. UPR files are text files with a very simple structure that can easily be written in a text editor or generated automatically. To start with, lets take a look at some syntactical issues: > Lines can have a maximum of 255 characters. > A backslash \ escapes newline characters. This may be used to extend lines. Use two backslashes in order to create a single literal backslash. > An isolated period character . serves as a section terminator. > All entries are case-sensitive. > Comment lines may be introduced with a percent % character, and terminated by the end of the line. A preceding backslash can be used to create literal percent characters which do not start a comment. > Whitespace is ignored everywhere except in resource names and file names. UPR files consist of the following components: > A magic line for identifying the file. It has the following form:
PS-Resources-1.0
> An optional section listing all resource categories described in the file. Each line describes one resource category. The list is terminated by a line with a single period character. Available resource categories are described below. > A section for each of the resource categories listed at the beginning of the file. Each section starts with a line showing the resource category, followed by an arbitrary number of lines describing available resources. The list is terminated by a line with a single period character. Each resource data line contains the name of the resource (equal signs have to be quoted). If the resource requires a file name, this name has to be added after an equal sign. The SearchPath (see below) will be applied when PDFlib searches for files listed in resource entries. File searching and the SearchPath resource category. PDFlib reads a variety of data items, such as raster images, font outline and metrics information, encoding definitions, PDF documents, and ICC color profiles from disk files. In addition to relative or absolute path names you can also use file names without any path specification. The SearchPath resource category can be used to specify a list of path names for directories containing the required data files. When PDFlib must open a file it will first use the file
53
name exactly as supplied and try to open the file. If this attempt fails, PDFlib will try to open the file in the directories specified in the SearchPath resource category one after another until it succeeds. SearchPath entries can be accumulated, and will be searched in reverse order (paths set at a later point in time will searched before earlier ones). This feature can be used to free PDFlib applications from platform-specific file system schemes. You can set search path entries as follows:
p.set_parameter("SearchPath", "/path/to/dir1"); p.set_parameter("SearchPath", "/path/to/dir2");
In order to disable the search you can use a fully specified path name in the PDFlib functions. Note the following platform-specific features of the SearchPath resource category: > On Windows PDFlib will initialize the SearchPath with an entry from the registry. The following registry entry may contain a list of path names separated by a semicolon ; character:
HKLM\SOFTWARE\PDFlib\PDFlib\8.0.0beta1\SearchPath
> On IBM iSeries the SearchPath resource category will be initialized with the following values:
/pdflib/8.0.0beta1/fonts /pdflib/8.0.0beta1/bind/data
> On IBM zSeries systems with MVS the SearchPath feature is not supported. > On OpenVMS logical names can be supplied as SearchPath. Sample UPR file. The following listing gives an example of a UPR configuration file:
PS-Resources-1.0 SearchPath /usr/local/lib/fonts C:/psfonts/pfm C:/psfonts /users/kurt/my_images . FontAFM Code-128=Code_128.afm . FontPFM Corporate-Bold=corpb___.pfm Mistral=c:/psfonts/pfm/mist____.pfm . FontOutline Code-128=Code_128.pfa ArialMT=Arial.ttf . HostFont Wingdings=Wingdings . Encoding myencoding=myencoding.enc . ICCProfile highspeedprinter=cmykhighspeed.icc .
54
Searching for the UPR resource file. If only the built-in resources (e.g., PDF core font, built-in encodings, sRGB ICC profile) or system resources (host fonts) are to be used, a UPR configuration file is not required since PDFlib will find all necessary resources without any additional configuration. If other resources are to be used you can specify such resources via calls to PDF_set_ parameter( ) (see below) or in a UPR resource file. PDFlib reads this file automatically when the first resource is requested. The detailed process is as follows: > On Unix, Linux and Mac OS X systems some directories will be searched by default for license and resource files even without specifying any path and directory names. Before searching and reading the UPR file, the following directories will be searched (in this order):
<rootpath>/PDFlib/PDFlib/8.0/resource/icc <rootpath>/PDFlib/PDFlib/8.0/resource/fonts <rootpath>/PDFlib/PDFlib/8.0/resource/cmap <rootpath>/PDFlib/PDFlib/8.0 <rootpath>/PDFlib/PDFlib <rootpath>/PDFlib
where <roothpath> will first be replaced with /usr/local and then with the HOME directory. This feature can be used to work with a license file, UPR file, or resources without setting any environment variables or runtime parameters. > If the environment variable PDFLIBRESOURCE is defined PDFlib takes its value as the name of the UPR file to be read. If this file cannot be read an exception will be thrown. > If the environment variable PDFLIBRESOURCE is not defined PDFlib tries to open a file with the following name:
upr (on MVS; a dataset is expected) pdflib/<version>/fonts/pdflib.upr (on IBM eServer iSeries) pdflib.upr (Windows, Unix, and all other systems)
If this file cannot be read no exception will be thrown. > On Windows PDFlib will additionally try to read the registry entry
HKLM\SOFTWARE\PDFlib\PDFlib\8.0.0beta1\resourcefile
The value of this entry (which will be created by the PDFlib installer, but can also be created by other means) will be taken as the name of the resource file to be used. If this file cannot be read an exception will be thrown. Be careful when manually accessing the registry on 64-bit Windows systems: as usual, 64-bit PDFlib binaries will work with the 64-bit view of the Windows registry, while 32-bit PDFlib binaries running on a 64-bit system will work with the 32-bit view of the registry. If you must add registry keys for a 32-bit product manually, make sure to use the 32-bit version of the regedit tool. It can be invoked as follows from the Start, Run... dialog:
%systemroot%\syswow64\regedit
> The client can force PDFlib to read a resource file at runtime by explicitly setting the resourcefile parameter:
p.set_parameter("resourcefile", "/path/to/pdflib.upr");
This call can be repeated arbitrarily often; the resource entries will be accumulated.
55
Configuring resources at runtime. In addition to using a UPR file for the configuration, it is also possible to directly configure individual resources within the source code via the PDF_set_parameter( ) function. This function takes a category name and a corresponding resource entry as it would appear in the respective section of this category in a UPR resource file, for example:
p.set_parameter("FontAFM", "Foobar-Bold=foobb___.afm"); p.set_parameter("FontOutline", "Foobar-Bold=foobb___.pfa");
Note Font configuration is discussed in more detail in Section 5.3.1, Searching for Fonts, page 110. Querying resource values. In addition to setting resource entries you can query values using PDF_get_parameter( ). Specify the category name as key and the index in the list as modifier. For example, the following call:
s = p.get_parameter("SearchPath", n);
will retrieve the n-th entry in the SearchPath list If n is larger than the number of available entries for the requested category an empty string will be returned. The returned string is valid until the next call to any API function.
Note The PDF data in the buffer must be treated as binary data. This is considered active mode since the client decides when he wishes to fetch the buffer contents. Active mode is available for all supported language bindings.
56
Note C and C++ clients must not free the returned buffer. The passive in-core PDF generation interface. In passive mode, which is only available in the C and C++ language bindings, the user installs (via PDF_open_document_ callback( )) a callback function which will be called at unpredictable times by PDFlib whenever PDF data is waiting to be consumed. Timing and buffer size constraints related to flushing (transferring the PDF data from the library to the client) can be configured by the client in order to provide for maximum flexibility. Depending on the environment, it may be advantageous to fetch the complete PDF document at once, in multiple chunks, or in many small segments in order to prevent PDFlib from increasing the internal document buffer. The flushing strategy can be set using the flush option of PDF_open_document_callback( )).
57
In contrast, the following items must always be treated in binary mode (i.e., any conversion must be avoided): > PDF input and output files > PFB font outline and PFM font metrics files > TrueType and OpenType font files > image files and ICC profiles
58
The first coordinate increases to the right, the second coordinate increases upwards. PDFlib client programs may change the default user space by rotating, scaling, translating, or skewing, resulting in new user coordinates. The respective functions for these transformations are PDF_rotate( ), PDF_scale( ), PDF_translate( ), and PDF_skew( ). If the coordinate system has been transformed, all coordinates in graphics and text functions must be supplied according to the new coordinate system. The coordinate system is reset to the default coordinate system at the start of each page. Using metric coordinates. Metric coordinates can easily be used by scaling the coordinate system. The scaling factor is derived from the definition of the DTP point given above:
p.scale(28.3465, 28.3465);
After this call PDFlib will interpret all coordinates (except for interactive features, see below) in centimeters since 72/2.54 = 28.3465. As an alternative, the userunit option in PDF_begin/end_page_ext( ) (PDF 1.6) can be specified to supply a scaling factor for the whole page. Note that user units will only affect final page display in Acrobat, but not any coordinate scaling in PDFlib. Cookbook A full code sample can be found in the Cookbook topic general/metric_topdown_coordinates. Coordinates for interactive elements. PDF always expects coordinates for interactive functions, such as the rectangle coordinates for creating text annotations, links, and file annotations in the default coordinate system, and not in the (possibly transformed) user coordinate system. Since this is very cumbersome PDFlib offers automatic conversion of user coordinates to the format expected by PDF. This automatic conversion is activated by setting the usercoordinates parameter to true:
p.set_parameter("usercoordinates", "true");
Since PDF supports only link and field rectangles with edges parallel to the page edges, the supplied rectangles must be modified when the coordinate system has been transformed by scaling, rotating, translating, or skewing it. In this case PDFlib will calculate the smallest enclosing rectangle with edges parallel to the page edges, transform it to default coordinates, and use the resulting values instead of the supplied coordinates. The overall effect is that you can use the same coordinate systems for both page content and interactive elements when the usercoordinates parameter has been set to true. Visualizing coordinates. In order to assist PDFlib users in working with PDFs coordinate system, the PDFlib distribution contains the PDF file grid.pdf which visualizes the
59
coordinates for several common page sizes. Printing the appropriately sized page on transparent material may provide a useful tool for preparing PDFlib development. Acrobat (not Adobe Reader) also has a helpful facility. Simply choose View, Navigation tabs, Info to display a measurement palette. Note that the coordinates displayed refer to an origin in the top left corner of the page, and not PDFs default origin in the lower left corner. To change the display units go to Edit, Preferences, [General...], Units & Guides [or Page Units] and choose one of Points, Inches, Millimeters, Picas, Centimeters. You can also go to View, Navigation Tabs, Info and select a unit from the Options menu. Dont be mislead by PDF printouts which seem to experience wrong page dimensions. These may be wrong because of some common reasons: > The Page Scaling: option in Acrobats print dialog has a setting different from None, resulting in scaled print output. > Non-PostScript printer drivers are not always able to retain the exact size of printed objects. Rotating objects. It is important to understand that objects cannot be modified once they have been drawn on the page. Although there are PDFlib functions for rotating, translating, scaling, and skewing the coordinate system, these do not affect existing objects on the page but only subsequently drawn objects. Rotating text, images, and imported PDF pages can easily be achieved with the rotate option of PDF_fit_textline( ), PDF_fit_textflow( ), PDF_fit_image( ), and PDF_fit_pdi_page( ). Rotating such objects by multiples of 90 degrees inside the respective fitbox can be accomplished with the orientate option of these functions. The following example generates some text at an angle of 45 degrees:
p.fit_textline("Rotated text", 50.0, 700.0, "rotate=45");
Cookbook A full code sample can be found in the Cookbook topic text_output/rotated_text . Rotation for vector graphics can be achieved by applying the general coordinate transformation functions PDF_translate( ) and PDF_rotate( ). The following example creates a rotated rectangle with lower left corner at (200, 100). It translates the coordinate origin to the desired corner of the rectangle, rotates the coordinate system, and places the rectangle at (0, 0). The save/restore nesting makes it easy to continue placing objects in the original coordinate system after the rotated rectangle is done:
p.save(); p.translate(200, 100); p.rotate(45.0); p.rect(0.0, 0.0, 75.0, 25.0); p.stroke(); p.restore(); /* move origin to corner of rectangle*/ /* rotate coordinates */ /* draw rotated rectangle */
Using top-down coordinates. Unlike PDFs bottom-up coordinate system some graphics environments use top-down coordinates which may be preferred by some developers. Such a coordinate system can easily be established using PDFlibs transformation functions. However, since the transformations will also affect text output (text easily appears bottom-up), additional calls are required in order to avoid text being displayed in a mirrored sense. In order to facilitate the use of top-down coordinates PDFlib supports a special mode in which all relevant coordinates will be interpreted differently. The topdown feature
60
has been designed to make it quite natural for PDFlib users to work in a top-down coordinate system. Instead of working with the default PDF coordinate system with the origin (0, 0) at the lower left corner of the page and y coordinates increasing upwards, a modified coordinate system will be used which has its origin at the upper left corner of the page with y coordinates increasing downwards. This top-down coordinate system for a page can be activated with the topdown option of PDF_begin_page_ext( ) :
p.begin_page_ext(595.0, 842.0, "topdown");
Alternatively, the topdown parameter can be used, but it must not be set within a page description (but only between pages). For the sake of completeness well list the detailed consequences of establishing a top-down coordinate system below. Absolute coordinates will be interpreted in the user coordinate system without any modification: > All function parameters which are designated as coordinates in the function descriptions. Some examples: x, y in PDF_moveto( ); x, y in PDF_circle( ), x, y (but not width and height!) in PDF_rect( ); llx, lly, urx, ury in PDF_create_annotation( )). Relative coordinate values will be modified internally to match the top-down system: > Text (with positive font size) will be oriented towards the top of the page; > When the manual talks about lower left corner of a rectangle, box etc. this will be interpreted as you see it on the page; > When a rotation angle is specified the center of the rotation is still the origin (0, 0) of the user coordinate system. The visual result of a clockwise rotation will still be clockwise. Cookbook A full code sample can be found in the Cookbook topic general/metric_topdown_coordinates.
61
Different page size boxes. While many PDFlib developers only specify the width and height of a page, some advanced applications (especially for prepress work) may want to specify one or more of PDFs additional box entries. PDFlib supports all of PDFs box entries. The following entries, which may be useful in certain environments, can be specified by PDFlib clients (definitions taken from the PDF reference): > MediaBox: this is used to specify the width and height of a page, and describes what we usually consider the page size. > CropBox: the region to which the page contents are to be clipped; Acrobat uses this size for screen display and printing. > TrimBox: the intended dimensions of the finished (possibly cropped) page; > ArtBox: extent of the pages meaningful content. It is rarely used by application software; > BleedBox: the region to which the page contents are to be clipped when output in a production environment. It may encompass additional bleed areas to account for inaccuracies in the production process. PDFlib will not use any of these values apart from recording it in the output file. By default PDFlib generates a MediaBox according to the specified width and height of the page, but does not generate any of the other entries. The following code fragment will start a new page and set the four values of the CropBox:
/* start a new page with custom CropBox */ p.begin_page_ext(595, 842, "cropbox={10 10 500 800}");
Number of pages in a document. There is no limit in PDFlib regarding the number of generated pages in a document. PDFlib generates PDF structures which allow Acrobat to efficiently navigate documents with hundreds of thousands of pages.
62
any appearance properties (e.g. color, line width) of a path you must do so before starting any drawing operations. These rules can be summarized as dont change the appearance within a path description. Merely constructing a path doesnt result in anything showing up on the page; you must either fill or stroke the path in order to get visible results:
p.setcolor("stroke", "rgb", 1, 0, 0, 0); p.moveto(100, 100); p.lineto(200, 100); p.stroke();
Most graphics functions make use of the concept of a current point, which can be thought of as the location of the pen used for drawing. Cookbook A full code sample can be found in the Cookbook topic graphics/starter_graphics. Path objects. Path objects are more convenient and powerful alternative to direct paths. Path objects encapsulate all drawing operations for constructing the path. Path objects can be created with PDF_add_path_point( ) or extracted from an image file which includes an image clipping path (see below). PDF_add_path_point( ) supports several convenience options to facilitate path construction. Once a path object has been created it can be used for different purposes: > The path object can be used on the page description with PDF_draw_path( ), i.e. filled, stroked, or used as a clipping path. > Path objects can be used as wrap shapes for Textflow: the text will be formatted so that it wraps inside or outside of an arbitrary shape (see Section 8.2.9, Wrapping Text around Paths and Images, page 194). > Text can also be placed on a path, i.e. the characters follow the lines and curves of the path (see Section 8.1.7, Text on a Path, page 177). > Path objects can be placed in table cells. Unlike direct paths, path objects can be used again and again until they are explicitly destroyed with PDF_delete_path( ). Information about a path can be retrieved with PDF_ info_path( ). The following code fragment creates a simple path shape with a circle, strokes it at two different locations on the page, and finally deletes it:
path = p.add_path_point( -1, 0, 100, "move", ""); path = p.add_path_point(path, 200, 100, "control", ""); path = p.add_path_point(path, 0, 100, "circular", ""); p.draw_path(path, 0, 0, "stroke"); p.draw_path(path, 400, 500, "stroke"); p.delete_path(path);
Instead of creating a path object with individual drawing operations you can extract the clipping path from an imported image:
image = p.load_image("auto", "image.tif", "clippingpathname={path 1}"); /* create a path object from the images clipping path */ path = (int) p.info_image(image, "clippingpath", ""); if (path == -1) throw new Exception("Error: clipping path not found!");
63
p.draw_path(path, 0, 0, "stroke");
3.2.4 Templates
Templates in PDF. PDFlib supports a PDF feature with the technical name Form XObjects. However, since this term conflicts with interactive forms we refer to this feature as templates. A PDFlib template can be thought of as an off-page buffer into which text, vector, and image operations are redirected (instead of acting on a regular page). After the template is finished it can be used much like a raster image, and placed an arbitrary number of times on arbitrary pages. Like images, templates can be subjected to geometrical transformations such as scaling or skewing. When a template is used on multiple pages (or multiply on the same page), the actual PDF operators for constructing the template are only included once in the PDF file, thereby saving PDF output file size. Templates suggest themselves for elements which appear repeatedly on several pages, such as a constant background, a company logo, or graphical elements emitted by CAD and geographical mapping software. Other typical examples for template usage include crop and registration marks or custom Asian glyphs. Using templates with PDFlib. Templates can only be defined outside of a page description, and can be used within a page description. However, templates may also contain other templates. Obviously, using a template within its own definition is not possible. Referring to an already defined template on a page is achieved with the PDF_fit_image( ) function just like images are placed on the page (see Section 7.3, Placing Images and imported PDF Pages, page 164). The general template idiom in PDFlib looks as follows:
/* define the template */ template = p.begin_template_ext(template_width, template_height, ""); ...place marks on the template using text, vector, and image functions... p.end_template_ext(0, 0); ... p.begin_page(page_width, page_height); /* use the template */ p.fit_image(template, 0.0, 0.0, ""); ...more page marking operations... p.end_page(); ... p.close_image(template);
All text, graphics, and color functions can be used on a template. However, the following functions must not be used while constructing a template: > PDF_load_image( ): this is not a big restriction since images can be opened outside of a template definition, and freely be used within a template (but not opened). > All interactive functions, since these must always be defined on the page where they should appear in the document, and cannot be generated as part of a template. Cookbook A full code sample can be found in the Cookbook topic general/repeated_contents.
64
65
66
Cookbook A full code sample can be found in the Cookbook topic color/color_gradient .
67
examples. Generally, PANTONE color names must be constructed according to the following scheme:
PANTONE <id> <paperstock>
where <id> is the identifier of the color (e.g., 185) and <paperstock> the abbreviation of the paper stock in use (e.g., C for coated). A single space character must be provided between all components constituting the swatch name. If a spot color is requested where the name starts with the PANTONE prefix, but the name does not represent a valid PANTONE color, the function call will fail. The following code snippet demonstrates the use of a PANTONE color with a tint value of 70 percent:
spot = p.makespotcolor("PANTONE 281 U"); p.setcolor("fill", "spot", spot, 0.7, 0, 0);
Note PANTONE colors displayed here may not match PANTONE-identified standards. Consult current PANTONE Color Publications for accurate color. PANTONE and other Pantone, Inc. trademarks are the property of Pantone, Inc. Pantone, Inc., 2003. Note PANTONE colors are not supported in PDF/X-1a mode.
Table 3.3 PANTONE spot color libraries built into PDFlib color library name PANTONE solid coated PANTONE solid uncoated PANTONE solid matte PANTONE process coated PANTONE process uncoated PANTONE process coated EURO PANTONE process uncoated EURO PANTONE pastel coated PANTONE pastel uncoated PANTONE metallic coated PANTONE color bridge CMYK PC PANTONE color bridge CMYK EURO PANTONE color bridge uncoated PANTONE hexachrome coated PANTONE hexachrome uncoated PANTONE solid in hexachrome coated PANTONE solid to process coated PANTONE solid to process coated EURO PANTONE Goe coated PANTONE Goe uncoated sample color name PANTONE 185 C PANTONE 185 U PANTONE 185 M PANTONE DS 35-1 C PANTONE DS 35-1 U PANTONE DE 35-1 C PANTONE DE 35-1 U PANTONE 9461 C PANTONE 9461 U PANTONE 871 C PANTONE 185 PC PANTONE 185 EC PANTONE 185 UP PANTONE H 305-1 C PANTONE H 305-1 U PANTONE 185 HC PANTONE 185 PC PANTONE 185 EC PANTONE 42-1-1 C PANTONE 42-1-1 U replaced by PANTONE color bridge CMYK PC replaced by PANTONE color bridge CMYK EURO 2058 colors introduced in 2008 2058 colors introduced in 2008 introduced in May 2006 includes new colors introduced in 2006 includes new colors introduced in 2006 includes new colors introduced in 2006 replaces PANTONE solid to process coated replaces PANTONE solid to process coated EURO introduced in July 2006 not recommended; will be discontinued not recommended; will be discontinued remarks
68
HKS colors. The HKS color system is widely used in Germany and other European countries. PDFlib fully supports HKS colors. All color swatch names from the following digital color libraries (Farbfcher) can be used (sample swatch names are provided in parentheses): > HKS K (Kunstdruckpapier) for gloss art paper, 88 colors (HKS 43 K) > HKS N (Naturpapier) for natural paper, 86 colors (HKS 43 N) > HKS E (Endlospapier) for continuous stationary/coated, 88 colors (HKS 43 E) > HKS Z (Zeitungspapier) for newsprint, 50 colors (HKS 43 Z)
Commercial PDFlib customers can request a text file with the full list of HKS spot color names from our support. Spot color names are case-sensitive; use uppercase as shown in the examples. The HKS prefix must always be provided in the swatch name as shown in the examples. Generally, HKS color names must be constructed according to one of the following schemes:
HKS <id> <paperstock>
where <id> is the identifier of the color (e.g., 43) and <paperstock> the abbreviation of the paper stock in use (e.g., N for natural paper). A single space character must be provided between the HKS, <id>, and <paperstock> components constituting the swatch name. If a spot color is requested where the name starts with the HKS prefix, but the name does not represent a valid HKS color, the function call will fail. The following code snippet demonstrates the use of an HKS color with a tint value of 70 percent:
spot = p.makespotcolor("HKS 38 E"); p.setcolor("fill", "spot", spot, 0.7, 0, 0);
User-defined spot colors. In addition to built-in spot colors as detailed above, PDFlib supports custom spot colors. These can be assigned an arbitrary name (which must not conflict with the name of any built-in color, however) and an alternate color which will be used for screen preview or low-quality printing, but not for high-quality color separations. The client is responsible for providing suitable alternate colors for custom spot colors. There is no separate PDFlib function for setting the alternate color for a new spot color; instead, the current fill color will be used. Except for an additional call to set the alternate color, defining and using custom spot colors works similarly to using built-in spot colors:
p.setcolor("fill", "cmyk", 0.2, 1.0, 0.2, 0); spot = p.makespotcolor("CompanyLogo"); p.setcolor("fill", "spot", spot, 1, 0, 0); /* define alternate CMYK values */ /* derive a spot color from it */ /* set the spot color */
69
Device-Independent CIE L*a*b* Color. Device-independent color values can be specified in the CIE 1976 L*a*b* color space by supplying the color space name lab to PDF_ setcolor( ). Colors in the L*a*b* color space are specified by a luminance value in the range 0-100, and two color values in the range -127 to 128. The illuminant used for the lab color space will be D50 (daylight 5000K, 2 observer) Rendering Intents. Although PDFlib clients can specify device-independent color values, a particular output device is not necessarily capable of accurately reproducing the required colors. In this situation some compromises have to be made regarding the trade-offs in a process called gamut compression, i.e., reducing the range of colors to a smaller range which can be reproduced by a particular device. The rendering intent can be used to control this process. Rendering intents can be specified for individual images by supplying the renderingintent parameter or option to PDF_load_image( ). In addition, rendering intents can be specified for text and vector graphics by supplying the renderingintent option to PDF_create_gstate( ). ICC profiles. The International Color Consortium (ICC)1 defined a file format for specifying color characteristics of input and output devices. These ICC color profiles are considered an industry standard, and are supported by all major color management system and application vendors. PDFlib supports color management with ICC profiles in the following areas: > Define ICC-based color spaces for text and vector graphics on the page. > Process ICC profiles embedded in imported image files. > Apply an ICC profile to an imported image (possibly overriding an ICC profile embedded in the image). > Define default color spaces for mapping grayscale, RGB, or CMYK data to ICC-based color spaces. > Define a PDF/X or PDF/A output intent by means of an external ICC profile. Color management does not change the number of components in a color specification (e.g., from RGB to CMYK). Note ICC color profiles for common printing conditions are available for download from www.pdflib.com, as well as links to other freely available ICC profiles. Searching for ICC profiles. PDFlib will search for ICC profiles according to the following steps, using the profilename parameter supplied to PDF_load_iccprofile( ): > If profilename=sRGB, PDFlib will use its internal sRGB profile (see below), and terminate the search. > Check whether there is a resource named profilename in the ICCProfile resource category. If so, use its value as file name in the following steps. If there is no such resource, use profilename as a file name directly. > Use the file name determined in the previous step to locate a disk file by trying the following combinations one after another:
<filename> <filename>.icc <filename>.icm <colordir>/<filename>
1. See www.color.org
70
<colordir>/<filename>.icc <colordir>/<filename>.icm
On Windows colordir designates the directory where device-specific ICC profiles are stored by the operating system (typically C:\WINNT\system32\spool\drivers\ color). On Mac OS X the following paths will be tried for colordir:
/System/Library/ColorSync/Profiles /Library/ColorSync/Profiles /Network/Library/ColorSync/Profiles ~/Library/ColorSync/Profiles
On other systems the steps involving colordir will be omitted. The sRGB color space and sRGB ICC profile. PDFlib supports the industry-standard RGB color space called sRGB (formally IEC 61966-2-1). sRGB is supported by a variety of software and hardware vendors and is widely used for simplified color management for consumer RGB devices such as digital still cameras, office equipment such as color printers, and monitors. PDFlib supports the sRGB color space and includes the required ICC profile data internally. Therefore an sRGB profile must not be configured explicitly by the client, but it is always available without any additional configuration. It can be requested by calling PDF_load_iccprofile( ) with profilename=sRGB. Using embedded profiles in images (ICC-tagged images). Some images may contain embedded ICC profiles describing the nature of the images color values. For example, an embedded ICC profile can describe the color characteristics of the scanner used to produce the image data. PDFlib can handle embedded ICC profiles in the PNG, JPEG, and TIFF image file formats. If the honoriccprofile option or parameter is set to true (which is the default) the ICC profile embedded in an image will be extracted from the image, and embedded in the PDF output such that Acrobat will apply it to the image. This process is sometimes referred to as tagging an image with an ICC profile. PDFlib will not alter the images pixel values. The image:iccprofile parameter can be used to obtain an ICC profile handle for the profile embedded in an image. This may be useful when the same profile shall be applied to multiple images. In order to check the number of color components in an unknown ICC profile use the icccomponents parameter. Applying external ICC profiles to images (tagging). As an alternative to using ICC profiles embedded in an image, an external profile may be applied to an individual image by supplying a profile handle along with the iccprofile option to PDF_load_image( ). ICC-based color spaces for page descriptions. The color values for text and vector graphics can directly be specified in the ICC-based color space specified by a profile. The color space must first be set by supplying the ICC profile handle as value to one of the setcolor:iccprofilegray, setcolor:iccprofilergb, setcolor:iccprofilecmyk parameters. Subsequently ICC-based color values can be supplied to PDF_setcolor( ) along with one of the color space keywords iccbasedgray, iccbasedrgb, or iccbasedcmyk:
p.set_parameter("errorpolicy", "return"); icchandle = p.load_iccprofile(...); if (icchandle == -1)
71
Mapping device colors to ICC-based default color spaces. PDF provides a feature for mapping device-dependent gray, RGB, or CMYK colors in a page description to deviceindependent colors. This can be used to attach a precise colorimetric specification to color values which otherwise would be device-dependent. Mapping color values this way is accomplished by supplying a DefaultGray, DefaultRGB, or DefaultCMYK color space definition. In PDFlib it can be achieved by setting the defaultgray, defaultrgb, or defaultcmyk options of PDF_begin_page_ext( ) and supplying an ICC profile handle as the corresponding value. The following examples will set the sRGB color space as the default RGB color space for text, images, and vector graphics:
/* sRGB is guaranteed to be always available */ icchandle = p.load_iccprofile("sRGB", 0, "usage=iccbased"); p.begin_page_ext(595, 842, "defaultrgb=" + icchandle);
Defining output intents for PDF/X and PDF/A. An output device (printer) profile can be used to specify an output condition for PDF/X. This is done by supplying usage=outputintent in the call to PDF_load_iccprofile( ). For PDF/A any kind of profile can be specified as output intent. For details see Section 10.4, PDF/X for Print Production, page 242, and Section 10.5, PDF/A for Archiving, page 249.
72
73
with a thin black border. Initially this is convenient for precise positioning, but we disabled the border with linewidth=0.
normalfont = p.load_font("Helvetica", "unicode", ""); p.begin_page_ext(pagewidth, pageheight, "topdown"); /* place the text line "Kraxi Systems, Inc." using a matchbox */ String optlist = "font=" + normalfont + " fontsize=8 position={left top} " + "matchbox={name=kraxi} fillcolor={rgb 0 0 1} underline"; p.fit_textline("Kraxi Systems, Inc.", 2, 20, optlist); /* create URI action */ optlist = "url={http://www.kraxi.com}"; int act = p.create_action("URI", optlist); /* create Link annotation on matchbox "kraxi" */ optlist = "action={activate " + act + "} linewidth=0 usematchbox={kraxi}"; /* 0 rectangle coordinates will be replaced with matchbox coordinates */ p.create_annotation(0, 0, 0, 0, "Link", optlist); p.end_page_ext("");
For an example of creating a Web link on an image or on parts of a textflow, see Section 8.4, Matchboxes, page 214. Cookbook A full code sample can be found in the Cookbook topic interactive/link_annotations. Bookmark for jumping to another file. Now lets create the bookmark Our Paper Planes Catalog which jumps to another PDF file called paper_planes_catalog.pdf. First we create an action of Type GoToR. In the option list for this action we define the name of the target document with the filename option; the destination option specifies a certain part of the page which will be enlarged. More precisely, the document will be displayed on the second page (page 2) with a fixed view (type fixed), where the middle of the page is visible (left 50 top 200) and the zoom factor is 200% (zoom 2):
String optlist = "filename=paper_planes_catalog.pdf " + "destination={page 2 type fixed left 50 top 200 zoom 2}"; goto_action = p.create_action("GoToR", optlist);
In the next step we create the actual bookmark. The action option for the bookmark contains the activate event which will trigger the action, plus the goto_action handle created above for the desired action. The option fontstyle bold specifies bold text, and textcolor {rgb 0 0 1} makes the bookmark blue. The bookmark text Our Paper Planes Catalog is provided as a function parameter:
String optlist = "action={activate " + goto_action + "} fontstyle=bold textcolor={rgb 0 0 1}"; catalog_bookmark = p.create_bookmark("Our Paper Planes Catalog", optlist);
Clicking the bookmark will display the specified part of the page in the target document.
74
Cookbook A full code sample can be found in the Cookbook topic interactive/nested_bookmarks. Annotation with file attachment. In the next example we create a file attachment. We start by creating an annotation of type FileAttachment. The filename option specifies the name of the attachment, the option mimetype image/gif specifies its type (MIME is a common convention for classifying file contents). The annotation will be displayed as a pushpin (iconname pushpin) in red (annotcolor {rgb 1 0 0}) and has a tooltip (contents {Get the Kraxi Paper Plane!}). It will not be printed (display noprint):
String optlist = "filename=kraxi_logo.gif mimetype=image/gif iconname=pushpin " + "annotcolor={rgb 1 0 0} contents={Get the Kraxi Paper Plane!} display=noprint";
Note that the size of the symbol defined with iconname does not vary; the icon will be displayed in its standard size in the top left corner of the specified rectangle.
The action option for the button form field contains the up event (in Acrobat: Mouse Up) as a trigger for executing the action, plus the print_action handle created above for the action itself. The backgroundcolor {rgb 1 1 0} option specifies yellow background, while bordercolor {rgb 0 0 0} specifies black border. The option caption Print adds the text Print to the button, and tooltip {Print the document} creates an additional explanation for the user. The font option specifies the font using the button_font handle created above. By default, the size of the caption will be adjusted so that it completely fits into the buttons area. Finally, the actual button form field is created with proper coordinates, the name print_button, the type pushbutton and the appropriate options:
String optlist = "action {up " + print_action + "} backgroundcolor={rgb 1 1 0} " + "bordercolor={rgb 0 0 0} caption=Print tooltip={Print the document} font=" + button_font;
Now we extend the first version of the button by replacing the text Print with a little printer icon. To achieve this we load the corresponding image file print_icon.jpg as a template before creating the page. Using the icon option we assign the template handle print_icon to the button field, and create the form field similarly to the code above:
print_icon = p.load_image("auto", "print_icon.jpg", "template"); if (print_icon == -1)
75
{ /* Error handling */ return; } p.begin_page_ext(pagewidth, pageheight, ""); ... String optlist = "action={up " + print_action + "} icon=" + print_icon + " tooltip={Print the document} font=" + button_font; p.create_field(left_x, left_y, right_x, right_y, "print_button", "pushbutton", optlist);
Cookbook A full code sample can be found in the Cookbook topic interactive/form_pushbutton. Simple text field. Now we create a text field near the upper right corner of the page. The user will be able to enter the current date in this field. We acquire a font handle and create a form field of type textfield which is called date, and has a gray background:
textfield_font = p.load_font("Helvetica-Bold", "unicode", ""); String optlist = "backgroundcolor={gray 0.8} font=" + textfield_font; p.create_field(left_x, left_y, right_x, right_y, "date", "textfield", optlist);
By default the font size is auto, which means that initally the field height is used as the font size. When the input reaches the end of the field the font size is decreased so that the text always fits into the field. Cookbook Full code samples can be found in the Cookbook topics in teractive/form_textfield_layout and interactive/form_textfield_height . Text field with JavaScript. In order to improve the text form field created above we automatically fill it with the current date when the page is opened. First we create an action of type JavaScript (in Acrobat: Run a JavaScript). The script option in the actions option list defines a JavaScript snippet which displays the current date in the date text field in the format month-day-year:
String optlist = "script={var d = util.printd('mmm dd yyyy', new Date()); " "var date = this.getField('date'); date.value = d;}" show_date = p.create_action("JavaScript", optlist);
In the second step we create the page. In the option list we supply the action option which attaches the show_date action created above to the trigger event open (in Acrobat: Page Open):
String optlist = "action={open " + show_date + "}"; p.begin_page_ext(pagewidth, pageheight, optlist);
Finally we create the text field as we did above. It will automatically be filled with the current date whenever the page is opened:
textfield_font = p.load_font("Helvetica-Bold", "winansi", ""); String optlist = "backgroundcolor={gray 0.8} font=" + textfield_font; p.create_field(left_x, left_y, right_x, right_y, "date", "textfield", optlist);
76
Cookbook A full code sample can be found in the Cookbook topic interactive/form_textfield_fill_ with_js. Formatting Options for Text Fields. In Acrobat it is possible to specify various options for formatting the contents of a text field, such as monetary amounts, dates, or percentages. This is implemented via custom JavaScript code used by Acrobat. PDFlib does not directly support these formatting features since they are not specified in the PDF reference. However, for the benefit of PDFlib users we present some information below which will allow you to realize formatting options for text fields by supplying simple JavaScript code fragements with the action option of PDF_create_field( ). In order to apply formatting to a text field JavaScript snippets are attached to a text field as keystroke and format actions. The JavaScript code calls some internal Acrobat function where the parameters control details of the formatting. The following sample creates two keystroke and format actions, and attaches them to a form field so that the field contents will be formatted with two decimal places and the EUR currency identifier:
keystroke_action = p.create_action("JavaScript", "script={AFNumber_Keystroke(2, 0, 3, 0, \"EUR \", true); }"); format_action = p.create_action("JavaScript", "script={AFNumber_Format(2, 0, 0, 0, \"EUR \", true); }"); String "font=" + font + " action={keystroke " + keystroke_action + " format=" + format_action + "}"; p.create_field(50, 500, 250, 600, "price", "textfield", optlist); optlist =
Cookbook A full code sample can be found in the Cookbook topic interactive/form_textfield_input_ format . In order to specify the various formats which are supported in Acrobat you must use appropriate functions in the JavaScript code. Table 3.4 lists the JavaScript function names for the keystroke and format actions for all supported formats; the function parameters are described in Table 3.5. These functions must be used similarly to the example above.
Table 3.4 JavaScript formatting functions for text fields format number percentage date time special JavaScript functions to be used for keystroke and format actions AFNumber_Keystroke(nDec, sepStyle, negStyle, currStyle, strCurrency, bCurrencyPrepend) AFNumber_Format(nDec, sepStyle, negStyle, currStyle, strCurrency, bCurrencyPrepend) AFPercent_Keystroke(ndec, sepStyle), AFPercent_Format(ndec, sepStyle) AFDate_KeystrokeEx(cFormat), AFDate_FormatEx(cFormat) AFTime_Keystroke(tFormat), AFTime_FormatEx(cFormat) AFSpecial_Keystroke(psf), AFSpecial_Format(psf)
77
Table 3.5 Parameters for the JavaScript formatting functions parameters nDec sepStyle explanation and possible values Number of decimal places The decimal separator style: 0 1 2 3 negStyle 1,234.56 1234.56 1.234,56 1234,56
Emphasis used for negative numbers: 0 1 2 3 Normal Use red text Show parenthesis both
Currency string to use, e.g. "\u20AC" for the Euro sign false true do not prepend currency symbol prepend currency symbol
A date format string. It may contain the following format placeholders, or any of the time formats listed below for tFormat: d dd ddd m mm mmm mmmm yyyy yy day of month day of month with leading zero abbreviated day of the week month as number month as number with leading zero abbreviated month name full month name year with four digits last two digits of year
tFormat
A time format string. It may contain the following format placeholders: h hh H HH M MM s ss t tt hour (0-12) hour with leading zero (0-12) hour (0-24) hour with leading zero (0-24) minutes minutes with leading zero seconds seconds with leading zero 'a' or 'p' 'am' or 'pm'
psf
Describes a few additional formats: 0 1 2 3 Zip Code Zip Code + 4 Phone Number Social Security Number
78
79
www.epsg.org
Well-known text (WKT). The WKT (Well-Known Text) system is descriptive and consists of a textual specification of all relevant parameters of a coordinate system. WKT is specified in the document OpenGIS Implementation Specification: Coordinate Transformation Services, which has been published as Document 01-009 by the Open Geospatial Consortium (OCG).It is available at the following location:
www.opengeospatial.org/standards/ct
WKT has also been standardized in ISO 19125-1. Although both WKT and EPSG can be used in Acrobat (and are supported in PDFlib), Acrobat 9 does not implement all possible EPSG codes. In particular, EPSG codes for geographic coordinate systems dont seem to be supported in Acrobat. In this case the use of WKT is recommended. The following Web site delivers the WKT corresponding to a particular EPSG code:
www.spatialreference.org/ref/epsg
The ETRS (European Terrestrial Reference System) geographic coordinate system is almost identical to WGS84. It can be specified as follows:
worldsystem={type=geographic wkt={ GEOGCS["ETRS_1989", DATUM["ETRS_1989", SPHEROID["GRS_1980", 6378137.0, 298.257222101]], PRIMEM["Greenwich", 0.0], UNIT["Degree", 0.0174532925199433]] }}
Note EPSG codes for the WGS84 and ETRS systems are not shown here because Acrobat doesnt seem to support EPSG codes for geographic coordinate systems, but only for projected coordinate systems (see below). Examples for projected coordinate systems. A projection is based on an underlying geographic coordinate system. In the following example we specify a projected coordinate system suitable for use with GPS coordinates. In middle Europe the system called ETRS89 UTM zone 32 N applies. It uses the common UTM (Universal Mercator Projection), and can be expressed as follows in the worldsystem suboption of the georeference option:
80
worldsystem={type=projected wkt={ PROJCS["ETRS_1989_UTM_Zone_32N", GEOGCS["GCS_ETRS_1989", DATUM["D_ETRS_1989", SPHEROID["GRS_1980", 6378137.0, 298.257222101], TOWGS84[0, 0, 0, 0, 0, 0, 0]], PRIMEM["Greenwich", 0.0], UNIT["Degree", 0.0174532925199433]], PROJECTION["Transverse_Mercator"], PARAMETER["False_Easting", 500000.0], PARAMETER["False_Northing", 0.0], PARAMETER["Central_Meridian", 9.0], PARAMETER["Scale_Factor", 0.9996], PARAMETER["Latitude_Of_Origin", 0.0], UNIT["Meter", 1.0]] }}
The corresponding EPSG code for this coordinate system is 25832. As an alternative to WKT, the system above can also be specified via its EPSG code as follows:
worldsystem={type=projected epsg=25832}
81
82
1. See www.unicode.org
4.1 Overview
83
Characters
U+0067 LATIN SMALL LETTER G
Glyphs
84
Cookbook A full code sample can be found in the Cookbook topic text_output/process_utf8. Unicode encoding schemes and the Byte Order Mark (BOM). Computer architectures differ in the ordering of bytes, i.e. whether the bytes constituting a larger value (16- or 32-bit) are stored with the most significant byte first (big-endian) or the least significant byte first (little-endian). A common example for big-endian architectures is PowerPC, while the x86 architecture is little-endian. Since UTF-8 and UTF-16 are based on values which are larger than a single byte, the byte-ordering issue comes into play here. An encoding scheme (note the difference to encoding form above) specifies the encoding form plus the byte ordering. For example, UTF-16BE stands for UTF-16 with big-endian byte ordering. If the byte ordering is not known in advance it can be specified by means of the code point U+FEFF, which is called Byte Order Mark (BOM). Although a BOM is not required in UTF-8, it may be present as well, and can be used to identify a stream of bytes as UTF-8. Table 4.1 lists the representation of the BOM for various encoding forms.
Table 4.1 Byte order marks for various Unicode encoding forms Encoding form UTF-8 UTF-16 big-endian UTF-16 little-endian UTF-32 big-endian UTF-32 little-endian Byte order mark (hex) EF BB BF FE FF FF FE 00 00 FE FF FF FE 00 00 graphical representation ? ? 1 ? ?1
85
86
> Non-Unicode CMaps for Chinese, Japanese, and Korean text (see Section 4.6, Chinese, Japanese, and Korean Encodings, page 100) must be avoided since the wrapper will always supply Unicode to the PDFlib core; only Unicode CMaps can be used. The overall effect is that clients can provide plain Unicode strings to PDFlib functions without any additional configuration or parameter settings. The distinction between hypertext strings and name strings in the function descriptions is not relevant for Unicode-aware language bindings. Unicode conversion functions. If you must deal with strings in other encodings than Unicode, you must convert them to Unicode before passing them to PDFlib. The language-specific sections in Chapter 2, PDFlib Language Bindings, page 25, provide more details regarding useful Unicode string conversion methods provided by common language environments.
87
fied in hypertextencoding will be applied to name strings as well. This can be used, for example, to specify font or file names in Shift-JIS. In C the length parameter must be 0 for UTF-8 strings. If it is different from 0 the string will be interpreted as UTF-16. In all other non-Unicode-aware language bindings there is no length parameter available in the API functions, and name strings must always be supplied in UTF-8 format. In order to create Unicode name strings in this case you can use the PDF_utf16_to_utf8( ) utility function to create UTF-8 (see below). Unicode conversion functions. In non-Unicode-aware language bindings PDFlib offers the PDF_utf16_to_utf8( ), PDF_utf8_to_utf16( ), and PDF_utf32_to_utf16( ) conversion functions which can be used to create UTF-8 or UTF-16 strings for passing them to PDFlib. The language-specific sections in Chapter 2, PDFlib Language Bindings, page 25, provide more details regarding useful Unicode string conversion methods provided by common language environments. Text format for content and hypertext strings. Unicode strings in PDFlib can be supplied in the UTF-8, UTF-16, or UTF-32 formats with any byte ordering. The choice of format can be controlled with the textformat parameter for all text on page descriptions, and the hypertextformat parameter for interactive elements. Table 4.2 lists the values which are supported for both of these parameters. The default for the [hyper]textformat parameter is auto. Use the usehypertextencoding parameter to enforce the same behavior for name strings. The default for the hypertextencoding parameter is auto.
Table 4.2 Values for the textformat and hypertextformat parameters [hyper]textformat bytes utf8 ebcdicutf8 utf16 explanation One byte in the string corresponds to one character. This is mainly useful for 8-bit encodings and symbolic fonts. A UTF-8 BOM at the start of the string will be evaluated and then removed. Strings are expected in UTF-8 format. Invalid UTF-8 sequences will trigger an exception if glyphcheck=error, or will be deleted otherwise. Strings are expected in EBCDIC-coded UTF-8 format (only on iSeries and zSeries). Strings are expected in UTF-16 format. A Unicode Byte Order Mark (BOM) at the start of the string will be evaluated and then removed. If no BOM is present the string is expected in the machines native byte ordering (on Intel x86 architectures the native byte order is little-endian, while on Sparc and PowerPC systems it is big-endian). Strings are expected in UTF-16 format in big-endian byte ordering. There is no special treatment for Byte Order Marks. Strings are expected in UTF-16 format in little-endian byte ordering. There is no special treatment for Byte Order Marks. Content strings: equivalent to bytes for 8-bit encodings and non-Unicode CMaps, and utf16 for wide-character addressing (unicode, glyphid, or a UCS2 or UTF16 CMap). Hypertext strings: UTF-8 and UTF-16 strings with BOM will be detected (in C UTF-16 strings must be terminated with a double-null). If the string does not start with a BOM, it will be interpreted as an 8-bit encoded string according to the hypertextencoding parameter. This setting will provide proper text interpretation in most environments which do not use Unicode natively.
88
Although the textformat setting is in effect for all encodings, it will be most useful for unicode encoding. Table 4.3 details the interpretation of text strings for various combinations of encodings and textformat settings.
Table 4.3 Relationship of encodings and text format [hypertext]encoding textformat=bytes All string types: auto U+XXXX unicode and UCS2or UTF16 CMaps any other CMap (not Unicode-based) Only content strings: 8-bit and builtin 8-bit codes Convert Unicode values to 8-bit codes according to the chosen encoding1. PDFlib will throw an exception if it is not a content string and no 8-bit encoding is found in the font (8-bit encodings are available in Type 1 and Type 3 fonts). Unicode values will be interpreted as glyph ids2 see section Automatic encoding, page 91 8-bit codes will be added to the offset XXXX to address Unicode values convert Unicode values to 8-bit codes according to the chosen Unicode offset textformat=utf8, utf16, utf16be, or utf16le
8-bit codes are Unicode values from any Unicode value, encoded according to the chosen text U+0000 to U+00FF format1 any single- or multibyte codes according to the chosen CMap PDFlib will throw an exception
glyphid
1. If the Unicode character is not available in the font, PDFlib will throw an exception or replace it subject to the glyphcheck option. 2. If the glyph id is not available in the font, PDFlib will issue a warning and replace it with glyph id 0.
Strings in option lists. Strings within option lists require special attention since in non-Unicode-aware language bindings they cannot be expressed as Unicode strings in UTF-16 format, but only as byte strings. For this reason UTF-8 is used for Unicode options. By looking for a BOM at the beginning of an option, PDFlib decides how to interpret it. The BOM will be used to determine the format of the string, and the string type (content string, hypertext string, or name string as defined above) will be used to determine the appropriate encoding. More precisely, interpreting a string option works as follows: > If the option starts with a UTF-8 BOM (0xEF 0xBB 0xBF) it will be interpreted as UTF-8. On EBCDIC-based systems: if the option starts with an EBCDIC UTF-8 BOM (0x57 0x8B 0xAB) it will be interpreted as EBCDIC UTF-8. If no BOM is found, string interpretation depends on the type of string: > Content strings will be interpreted according to the applicable encoding option or the encoding of the corresponding font (whichever is present). > Hypertext strings will be interpreted according to the hypertextencoding parameter or option. > Name strings will be interpreted according to the hypertext settings if usehypertextencoding=true, and host encoding otherwise. Note that the characters { and } require special handling within strings in option lists, and must be preceded by a \ character if they are used within a string option. This requirement remains for legacy encodings such as Shift-JIS: all occurrences of the byte
89
values 0x7B and 0x7D must be preceded with 0x5C. For this reason the use of UTF-8 for options is recommended (instead of Shift-JIS and other legacy encodings).
90
91
Table 4.4 Availability of glyphs for predefined encodings in several classes of fonts: some languages cannot be represented with Acrobats core fonts. TrueType Big Fonts4 yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes 5 miss. yes yes OpenType Pro Fonts3 yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
code page winansi macroman macroman_ apple ebcdic ebcdic_37 pdfdoc iso8859-1 iso8859-2 iso8859-3 iso8859-4 iso8859-5 iso8859-6 iso8859-7 iso8859-8 iso8859-9 iso8859-10 iso8859-13 iso8859-14 iso8859-15 iso8859-16 cp1250 cp1251 cp1252 cp1253 cp1254 cp1255 cp1256 cp1257 cp1258
supported languages identical to cp1252 (superset of iso8859-1) Mac Roman encoding, the original Macintosh character set similar to macroman, but replaces currency with Euro and includes additional mathematical/greek symbols EBCDIC code page 1047 EBCDIC code page 037 PDFDocEncoding (Latin-1) Western European languages (Latin-2) Slavic languages of Central Europe (Latin-3) Esperanto, Maltese (Latin-4) Estonian, the Baltic languages, Greenlandic Bulgarian, Russian, Serbian Arabic Modern Greek Hebrew and Yiddish (Latin-5) Western European, Turkish (Latin-6) Nordic languages (Latin-7) Baltic languages (Latin-8) Celtic (Latin-9) Adds Euro as well as French and Finnish characters to Latin-1 (Latin-10) Hungarian, Polish, Romanian, Slovenian Central European Cyrillic Western European (same as winansi) Greek Turkish Hebrew Arabic Baltic Viet Nam
yes yes yes yes yes yes yes1 yes1 yes1 yes yes yes yes
1 1
PostScript 3 fonts2 yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
1 miss. yes
1 miss. yes
yes1
1 miss. yes
1. The information in the table relates to the Times and Helvetica font families. The Courier font family which is used in Acrobat contains fewer glyphs, and does not cover iso8859-2, iso8859-4, iso8859-9, iso8859-10, iso8859-13, and iso8859-16. 2. Extended Adobe Latin character set (CE-Fonts), generally Type 1 Fonts shipped with PostScript 3 devices 3. Adobe OpenType Pro fonts contain more glyphs than regular OpenType fonts. 4. Windows TrueType fonts containing large glyph complements, e.g. Tahoma
92
For symbol fonts the keyword auto will be mapped to builtin encoding. While automatic encoding is convenient in many circumstances, using this method will make your PDFlib client programs inherently non-portable. Tapping system code pages. PDFlib can be instructed to fetch code page definitions from the system and transform it appropriately for internal use. This is very convenient since it frees you from implementing the code page definition yourself. Instead of supplying the name of a built-in or user-defined encoding for PDF_load_font( ), simply use an encoding name which is known to the system. This feature is only available on selected platforms, and the syntax for the encoding string is platform-specific: > On Windows the encoding name is cp<number>, where <number> is the number of any single-byte code page installed on the system (see Section 6.4.2, Custom CJK Fonts, page 141, for information on multi-byte Windows code pages):
font = p.load_font("Helvetica", "cp1250", "");
Single-byte code pages will be transformed into an internal 8-bit encoding, while multi-byte code pages will be mapped to Unicode at runtime. The text must be supplied in a format which is compatible with the chosen code page (e.g. SJIS for cp932). > On IBM eServer iSeries any Coded Character Set Identifier (CCSID) can be used. The CCSID must be supplied as a string, and PDFlib will apply the prefix IBMCCSID to the supplied code page number. PDFlib will also add leading 0 characters if the code page number uses fewer than 5 characters. Supplying 0 (zero) as the code page number will result in the current jobs encoding to be used:
font = p.load_font("Helvetica", "273", "");
> On IBM eServer zSeries with USS or MVS any Coded Character Set Identifier (CCSID) can be used. The CCSID must be supplied as a string, and PDFlib will pass the supplied code page name to the system literally without applying any change:
font = p.load_font("Helvetica", "IBM-273", "");
User-defined 8-bit encodings. In addition to predefined encodings PDFlib supports user-defined 8-bit encodings. These are the way to go if you want to deal with some character set which is not internally available in PDFlib, such as EBCDIC character sets different from the one supported internally in PDFlib. PDFlib supports encoding tables defined by PostScript glyph names, as well as tables defined by Unicode values. The following tasks must be done before a user-defined encoding can be used in a PDFlib program (alternatively the encoding can also be constructed at runtime using PDF_encoding_set_char( )): > Generate a description of the encoding in a simple text format. > Configure the encoding in the PDFlib resource file (see Section 3.1.3, Resource Configuration and File Searching, page 52) or via PDF_set_parameter( ). > Provide a font (metrics and possibly outline file) that supports all characters used in the encoding. The encoding file simply lists glyph names and numbers line by line. The following excerpt shows the start of an encoding definition:
% Encoding definition for PDFlib, based on glyph names % name code Unicode (optional) space 32 0x0020
93
exclam ...
33
0x0021
If no Unicode value has been specified PDFlib will search for a suitable Unicode value in its internal tables. The next example shows a snippet from a Unicode code page:
% Code page definition for PDFlib, based on Unicode values % Unicode code 0x0020 32 0x0021 33 ...
More formally, the contents of an encoding or code page file are governed by the following rules: > Comments are introduced by a percent % character, and terminated by the end of the line. > The first entry in each line is either a PostScript glyph name or a hexadecimal Unicode value composed of a 0x prefix and four hex digits (upper or lower case). This is followed by whitespace and a hexadecimal (0xoo0xFF) or decimal (0255) character code. Optionally, name-based encoding files may contain a third column with the corresponding Unicode value. > Character codes which are not mentioned in the encoding file are assumed to be undefined. Alternatively, a Unicode value of 0x0000 or the character name .notdef can be provided for unused slots. As a naming convention we refer to name-based tables as encoding files (*.enc), and Unicode-based tables as code page files (*.cpg), although PDFlib treats both kinds in the same way, and doesnt care about file names. In fact, PDFlib will automatically convert between name-based encoding files and Unicode-based code page files whenever it is necessary. This conversion is based on Adobes standard list of PostScript glyph names (the Adobe Glyph List, or AGL1), but non-AGL names can also be used. PDFlib will assign free Unicode values to these non-AGL names, and adjusts the values when reading an OpenType font file which includes a mapping from glyph names to Unicode values. PDFlibs internal glyph list contains more than 6500 glyph names. Encoding files are required for PostScript fonts with non-standard glyph names, while code pages are more convenient when dealing with Unicode-based TrueType or OpenType fonts.
94
two hexadecimal digits specifying a byte value three octal digits (up to \377) specifying a byte value
Escape sequences will not be converted by default; you must explicitly set the escapesequence parameter or option to true if you want to use escape sequences in content strings:
p.set_parameter("escapesequence", "true");
Cookbook A full code sample can be found in the Cookbook topic fonts/escape_sequences. Escape sequences will be evaluated in all content strings, hypertext strings, and name strings after BOM detection, but before converting to the target format. If textformat= utf16le or utf16be escape sequences must be expressed as two byte values according to the selected format. If textformat=utf8 the resulting code will not be converted to UTF-8.
95
If an escape sequence cannot be resolved (e.g. \x followed by invalid hex digits) an exception will be thrown. For content strings the behavior is controlled by the glyphcheck and errorpolicy settings.
Note Code points 128-159 (decimal) or 0x80-0x9F (hexadecimal) do not reference winansi code points. In Unicode they do not refer to printable characters, but only control characters. The following are examples for valid character references along with a description of the resulting character:
­ ­ ­ å å å € € € < > & Α soft hyphen soft hyphen soft hyphen letter a with small circle above (decimal) letter a with small circle above (hexadecimal, lowercase x) letter a with small circle above (hexadecimal, uppercase X) Euro glyph (hexadecimal) Euro glyph (decimal) Euro glyph (entity name) less than sign greater than sign ampersand sign Greek Alpha
Note Although you can reference any Unicode character with character references (e.g. Greek characters and mathematical symbols), the font will not automatically be switched. In order to actually use such characters you must explicitly select an appropriate font if the current font does not contain the specified characters. In addition to the HTML-style references mentioned above PDFlib supports custom character entity references which can be used to specify control characters for Textflows. Table 4.6 lists these additional character references. PDFlib also supports custom character entity references for controlling Bidi formatting (see Table 6.3). If a character reference cannot be resolved (e.g. &# followed by invalid decimal digits, or & followed by an unknown character name) an exception will be thrown. For content strings the behavior is controlled by the glyphcheck and errorpolicy settings.
1. See www.w3.org/TR/REC-html40/charset.html#h-5.3
96
Table 4.6 Control characters and their meaning in Textflows Unicode character U+0020 U+00A0 U+0009 U+002D U+00AD U+000B U+2028 U+000A U+000D U+000D and U+000A U+0085 U+2029 U+000C entity name SP, space NBSP, nbsp HT, hortab HY, hyphen SHY, shy VT, verttab LS, linesep LF, linefeed CR, return CRLF NEL, newline PS, parasep FF, formfeed return PDF_fit_textflow( ) will stop, and return the string _nextpage. equiv. Textflow option space (none) (none) (none) (none) nextline nextparagraph meaning within Textflows in Unicode-compatible fonts align words and break lines (no-break space) space character which will not break lines horizontal tab: will be processed according to the ruler, tabalignchar, and tabalignment options separator character for hyphenated words (soft hyphen) hyphenation opportunity, only visible at line breaks (next line) forces a new line (next paragraph) Same effect as nextline; in addition, the parindent option will affect the next line.
Glyph name references. A font may contain glyphs which are not directly accessible because the corresponding Unicode values are not known in advance (e.g. PUA assignments) or because they do not even have Unicode values in the font. Although all glyphs in a font can be addressed via the glyphid encoding, this is very cumbersome and does not fit Unicode workflows. As a useful facility glyph name references can be used. These are similar to character references, but use a slightly different syntax and refer to the glyph by name (note that the first period character in the first example is part of the syntax, while the second is part of the glyph name in the examples):
&.T.swash; &.directional;
Glyph name references will not be converted by default; you must explicitly set the charref parameter or option to true if you want to use glyph name references in content strings:
p.set_parameter("charref", "true");
Glyph name references are useful for alternate forms (e.g. swash characters, tabular figures) and glyphs without any specific Unicode semantics (symbols, icons, and ornaments). The general syntax is &.<name>; where name is a glyph name which will be substituted as follows: > Font-specific glyph names from OpenType fonts (but not OpenType CID fonts) can be used for content strings (since these are always related to a particular font); > Glyph names used in encodings can be used for content strings; > Names from the Adobe Glyph List (including the uniXXXX and u1XXXX forms) plus certain common misnamed glyph names will always be accepted for content strings and hypertext strings.
97
If no glyph can be found for the name specified in a glyph name reference, an exception will be thrown. For content strings the behavior is controlled by the glyphcheck and errorpolicy settings. Glyph name references cannot be used with glyphid or builtin encoding. Using character and glyph name references. Character and glyph name references can be used in all content strings, hypertext strings, and name strings, e.g. in text which will be placed on the page using the show or Textflow functions, as well as in text supplied to the hypertext functions. Character references will not be processed in text with builtin encoding. However, you can use glyph name references for symbolic fonts by using unicode encoding. In this case all glyphs must be addressed by name; you cannot mix numerical codes and glyph names. For symbolic Type 3 fonts glyph name references require Unicode assignments for the glyphs in the font, and unicode encoding. The Unicode assignments can be achieved by defining an 8-bit encoding which assigns Unicode values to the glyphs, although this encoding wont be used for the Type 3 font. In non-Unicode-aware language bindings this also requires textformat=bytes. Character and glyph name references can also be enabled for Textflow processing by supplying the charref option to PDF_add/create_textflow( ) (either directly or as an inline option), PDF_fit_textline( ), or PDF_fill_textblock( ). If character and glyph name references are enabled, you can supply numeric references, entity references, and glyph name references in 8-bit-encoded text:
p.set_parameter("charref", "true"); font = p.load_font("Helvetica", "winansi", ""); if (font == -1) { ... } p.setfont(font, 24); p.show_xy("Price: 500€", 50, 500);
Character references will not be substituted in option lists, but they will be recognized in options with the Unichar data type (without the & and ; decoration). This recognition will always be enabled; it is not subject to the charref parameter or option. When an & character is found in the text which does not introduce a numerical reference, character reference, or glyph name reference, an exception will be thrown if glyphcheck=error. In other words, by setting glyphcheck=none you can use both character or glyph name references and isolated & characters in the same text.
98
Glyph ID addressing for TrueType and OpenType fonts. GIDs are used internally in TrueType and OpenType fonts, and uniquely address individual glyphs within a font. GID addressing frees the developer from any restriction in a given encoding scheme, and provides access to all glyphs which the font designer put into the font file. However, in order to use glyph IDs you must be familiar with the fonts internal glyph numbers. Generally there is no fixed relationship between GIDs and more common addressing schemes, such as Windows encodings or Unicode. The burden of converting application-specific codes to GIDs is placed on the PDFlib user. Glyph ID addressing for Type 1 and Type 3 fonts. PDFlib also supports glyph IDs for Type 1 and Type 3 fonts, which traditionally did not support this concept. This feature is mainly useful for printing font overview tables by querying the number of glyphs and iterating over all glyph IDs.
99
As an alternative method for configuring access to the CJK CMap files you can set the PDFLIBRESOURCEFILE environment variable to point to a UPR configuration file which contains a suitable SearchPath definition.
1. See partners.adobe.com/asn/tech/type/cidfonts.jsp for a wealth of resources related to CID fonts, including tables with all supported glyphs (search for character collection).
100
Table 4.7 Predefined CMaps for Japanese, Chinese, and Korean text (from the PDF Reference) locale Simplified Chinese CMap name UniGB-UCS2-H UniGB-UCS2-V UniGB-UTF16-H UniGB-UTF16-V GB-EUC-H GB-EUC-V GBpc-EUC-H GBpc-EUC-V GBK-EUC-H, -V GBKp-EUC-H GBKp-EUC-V GBK2K-H, -V Traditional Chinese UniCNS-UCS2-H UniCNS-UCS2-V UniCNS-UTF16-H UniCNS-UTF16-V B5pc-H, -V HKscs-B5-H HKscs-B5-V ETen-B5-H, -V ETenms-B5-H ETenms-B5-V CNS-EUC-H, -V Japanese UniJIS-UCS2-H, -V UniJIS-UCS2-HW-H UniJIS-UCS2-HW-V UniJIS-UTF16-H UniJIS-UTF16-V 83pv-RKSJ-H 90ms-RKSJ-H 90ms-RKSJ-V 90msp-RKSJ-H 90msp-RKSJ-V 90pv-RKSJ-H Add-RKSJ-H, -V EUC-H, -V Ext-RKSJ-H, -V H, V Korean UniKS-UCS2-H, -V UniKS-UTF16-H, -V KSC-EUC-H, -V KSCms-UHC-H KSCms-UHC-V KSCms-UHC-HW-H KSCms-UHC-HW-V KSCpc-EUC-H character set and text format Unicode (UCS-2) encoding for the Adobe-GB1 character collection Unicode (UTF-16BE) encoding for the Adobe-GB1 character collection. Contains mappings for all characters in the GB18030-2000 character set. Microsoft Code Page 936 (charset 134), GB 2312-80 character set, EUC-CN encoding Macintosh, GB 2312-80 character set, EUC-CN encoding, Script Manager code 2 Microsoft Code Page 936 (charset 134), GBK character set, GBK encoding Same as GBK-EUC-H, but replaces half-width Latin characters with proportional forms and maps code 0x24 to dollar ($) instead of yuan (). GB 18030-2000 character set, mixed 1-, 2-, and 4-byte encoding Unicode (UCS-2) encoding for the Adobe-CNS1 character collection Unicode (UTF-16BE) encoding for the Adobe-CNS1 character collection. Contains mappings for all of HKSCS-2001 (2- and 4-byte character codes) Macintosh, Big Five character set, Big Five encoding, Script Manager code 2 Hong Kong SCS (Supplementary Character Set), an extension to the Big Five character set and encoding Microsoft Code Page 950 (charset 136), Big Five with ETen extensions Same as ETen-B5-H, but replaces half-width Latin characters with proportional forms CNS 11643-1992 character set, EUC-TW encoding Unicode (UCS-2) encoding for the Adobe-Japan1 character collection Same as UniJIS-UCS2-H, but replaces proportional Latin characters with halfwidth forms Unicode (UTF-16BE) encoding for the Adobe-Japan1 character collection. Contains mappings for all characters in the JIS X 0213:1000 character set. Mac, JIS X 0208 with KanjiTalk6 extensions, Shift-JIS, Script Manager code 1 Microsoft Code Page 932 (charset 128), JIS X 0208 character set with NEC and IBM extensions Same as 90ms-RKSJ-H, but replaces half-width Latin characters with proportional forms Mac, JIS X 0208 with KanjiTalk7 extensions, Shift-JIS, Script Manager code 1 JIS X 0208 character set with Fujitsu FMR extensions, Shift-JIS encoding JIS X 0208 character set, EUC-JP encoding JIS C 6226 (JIS78) character set with NEC extensions, Shift-JIS encoding JIS X 0208 character set, ISO-2022-JP encoding Unicode (UCS-2) encoding for the Adobe-Korea1 character collection Unicode (UTF-16BE) encoding for the Adobe-Korea1 character collection KS X 1001:1992 character set, EUC-KR encoding Microsoft Code Page 949 (charset 129), KS X 1001:1992 character set plus 8822 additional hangul, Unified Hangul Code (UHC) encoding Same as KSCms-UHC-H, but replaces proportional Latin characters with halfwidth forms Mac, KS X 1001:1992 with Mac OS KH extensions, Script Manager Code 3
101
Note On MVS the CMap files must be installed from an alternate package which contains CMaps with shortened file names. Code pages for custom CJK fonts.On Windows PDFlib supports any CJK code page installed on the system. On other platforms the code pages listed in Table 4.8 can be used. These code pages will be mapped internally to the corresponding CMap (e.g. cp932 will be mapped to 90ms-RKSJ-H/V). Because of this mapping the appropriate CMaps must be configured (see above). The textformat parameter must be set to auto, and the text must be supplied in a format which is compatible with the chosen code page.
Table 4.8 CJK code pages (must be used with textformat=auto or textformat=bytes) locale Simplified Chinese Traditional Chinese Japanese Korean code page cp936 cp950 cp932 cp949 cp1361 format GBK Big Five Shift-JIS UHC Johab character set GBK Big Five with Microsoft extensions JIS X 0208:1997 with Microsoft extensions KS X 1001:1992, remaining 8822 hangul as extension Johab
102
5 Font Handling
5.1 Overview of Fonts and Encodings
Font handling is one of the most complex aspects of document formats. In this section we will summarize PDFlibs main characteristics with regard to font handling.
103
> Japanese gaiji (user-defined characters) which are not available in any predefined font or encoding.
104
> Encodings specific to a particular font. These are also called font-specific or builtin encodings. Wide-character addressing. In addition to 8-bit encodings, various other addressing schemes are supported which are much more powerful, and not subject to the 256 character limit. > Purely Unicode-based addressing via the unicode encoding keyword. In this case the client directly supplies Unicode strings to PDFlib. The Unicode strings may be formatted according to one of several standard methods (such as UTF-16, UTF-8) and byte orderings (little-endian or big-endian). > CMap-based addressing for a variety of Chinese, Japanese, and Korean standards. PDFlib supports all CMaps supported by Acrobat (see Section 6.4, Chinese, Japanese, and Korean Text Output, page 140). > Glyph id addressing for TrueType and OpenType fonts via the glyphid encoding keyword. This is useful for advanced text processing applications which need access to individual glyphs in a font without reference to any particular encoding scheme, or must address glyphs which do not have any Unicode mapping. The number of valid glyph ids in a font can be queried with the maxcode keyword in PDF_info_font( ). > Direct CID addressing: this is mainly useful for creating CJK character collection tables.
105
106
PostScript glyph names. In order to write a custom encoding file or find fonts which can be used with one of the supplied encodings you will have to find information about the exact definition of the character set to be defined by the encoding, as well as the glyph names used in the font files. You must also ensure that a chosen font provides all necessary characters for the encoding. If you happen to have the FontLab1 font editor (by the way, a great tool for dealing with all kinds of font and encoding issues), you may use it to find out about the encodings supported by a given font (look for code pages in the FontLab documentation).2 In order to address glyphs in a font by their name you can use PDFlibs syntax for glyph names (see Section 4.5.2, Character References and Glyph Name References, page 96).
SING fonts have been developed as a solution to the Gaiji problem with CJK text, i.e. custom glyphs which are not encoded in Unicode or any of the common CJK legacy encodings.
1. See www.fontlab.com 2. Information about the glyph names used in PostScript fonts can be found at partners.adobe.com/asn/tech/type/ unicodegn.jsp (although font vendors are not required to follow these glyph naming recommendations).
107
SING fonts usually contain only a single glyph (they may also contain an additional vertical variant). The Unicode value of this glyph can be retrieved with PDFlib by requesting its glyph ID and subsequently the Unicode value for this glyph ID:
maingid = (int) p.info_font(font, "maingid", ""); uv = (int) p.info_font(font, "unicode", "gid=" + maingid);
It is recommended to use SING fonts as fallback font with the gaiji suboption of the forcechars option of the fallbackfonts option of PDF_load_font( ). Cookbook A full code sample can be found in the Cookbook topic fonts/starter_fallback. The low-cost FontLab SigMaker tool can be used to generate SING fonts based on an existing image or glyph from another font:
www.fontlab.com/font-utility/sigmaker/
Cookbook Full code samples can be found in the Cookbook topics fonts/starter_type3font, fonts/ type3_bitmaptext, fonts/type3_rasterlogo, and fonts/type3_vectorlogo. The font will be registered in PDFlib, and its name can be supplied to PDF_load_font( ) along with an encoding which contains the names of the glyphs in the Type 3 font. Please note the following when working with Type 3 fonts: > Similar to patterns and templates, images cannot be opened within a glyph description. However, they can be opened before starting a glyph description, and placed
108
>
>
>
> >
within the glyph description. Alternatively, inline images may be used for small bitmaps to overcome this restriction. Due to restrictions in PDF consumers all characters used in text output must actually be defined in the font: if character code x is to be displayed with PDF_show( ) or a similar function, and the encoding contains glyphname at position x, then glyphname must have been defined via PDF_begin_glyph( ). This restriction affects only Type 3 fonts; missing glyphs in PostScript Type 1, TrueType, or OpenType fonts will simply be ignored. Some PDF consumers require a glyph named .notdef if codes will be used for which the corresponding glyph names are not defined in the font. Acrobat 8 may even crash if a .notdef glyph is not present. The .notdef glyph must be present, but it may simply contain an empty glyph description. When normal bitmap data is used to define characters, unused pixels in the bitmap will print as white, regardless of the background. In order to avoid this and have the original background color shine through, use the mask parameter for constructing the bitmap image. The interpolate option for images may be useful for enhancing the screen and print appearance of Type 3 bitmap fonts. Type 3 fonts do not contain any typographic properties such as ascender, descender, etc. However, these can be set by using the corresponding options in PDF_load_font( ).
109
Type 3 fonts. Type 3 fonts must be defined at runtime by defining its glyphs with standard PDFlib graphics functions (see Section 5.2.4, User-Defined (Type 3) Fonts, page 108). If the font name supplied to PDF_begin_font( ) matches the font name requested with PDF_load_font( ) the font will be selected, for example:
font = p.load_font("PDFlibLogoFont", "logoencoding", "");
Font outline files. The font name is related to the name of a disk-based or virtual font outline file via the FontOutline resource, for example:
p.set_parameter("FontOutline", "f1=/usr/fonts/DFHSMincho-W3.ttf"); font = p.load_font("f1", "unicode", "");
As an alternative to runtime configuration via PDF_set_parameter( ), the FontOutline resource can be configured in a UPR file (see Section 3.1.3, Resource Configuration and File Searching, page 52). In order to avoid absolute file names you can use the Search-
110
Path resource category (again, the SearchPath resource category can alternatively be configured in a UPR file), for example:
p.set_parameter("SearchPath", "/usr/fonts"); p.set_parameter("FontOutline", "f1=DFHSMincho-W3.ttf"); font = p.load_font("f1", "unicode", "");
For PostScript fonts the corresponding resource configuration must relate the font metrics and outline data (the latter only if embedding is requested, see Section 5.3.3, Font Embedding, page 115) to the corresponding disk file(s):
p.set_parameter("FontOutline", "f1=LuciduxSans.pfa"); p.set_parameter("FontPFM", "f1=LuciduxSans.pfm"); font = p.load_font("f1", "unicode", "embedding");
In order to select a font which is contained in a TrueType Collection (TTC, see Section 6.4.2, Custom CJK Fonts, page 141) file you directly specify its name:
p.set_parameter("FontOutline", "f1=msgothic.ttc"); font = p.load_font("f1", "unicode", "");
The font name can be encoded in ASCII or Unicode, and will be matched against all names of all fonts in the TTC file. Alternatively, to select the n-th font in a TTC file you can specify the number n with a colon after the font name:
p.set_parameter("FontOutline", "f1=msgothic.ttc"); font = p.load_font("f1:0", "unicode", "");
PostScript font metric files. The font name is related to the name of a disk-based or virtual PostScript font metric file via the FontAFM or FontPFM resource. This is sufficient if font embedding is not required, for example:
p.set_parameter("FontOutline", "f2=carta.afm"); font = p.load_font("f2", "builtin", "");
Host font aliases. The font name is related to the name of a host font via the HostFont resource. For example, to replace one of the Latin core fonts (see below) with a host font installed on the system you must configure the font in the HostFont resource category. The following line makes sure that instead of using the built-in core font data, the Symbol font metrics and outline data will be taken from the host system:
p.set_parameter("HostFont", "Symbol=Symbol"); font = p.load_font("Symbol", "builtin", "embedding");
Latin core fonts. PDF viewers support a core set of 14 fonts which are assumed to be always available. Full metrics information for the core fonts is already built into PDFlib so that no additional data are required (unless the font is to be embedded). The core fonts have the following names: Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique, Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique, Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic, Symbol, ZapfDingbats The following code fragment requests one of the core fonts without any configuration:
111
Host fonts. If the font name matches the name of a system font (also known as a host font) on Windows or Mac it will be selected. See Section 5.3.2, Host Fonts on Windows and Mac, page 112, for more details on host fonts. Example:
font = p.load_font("Verdana", "unicode", "");
On Windows an optional font style can be added to the font name after a comma:
font = p.load_font("Verdana,Bold", "unicode", "");
Host font names can be encoded in ASCII. On Windows Unicode can also be used. Extension-based search for font files. If PDFlib couldnt find any font with the specified name it will loop over all entries in the SearchPath resource category, and add all known file name suffixes to the supplied font name in an attempt to locate the font metrics or outline data. The details of the extension-based search algorithm are as follows: > The following suffixes will be added to the font name, and the resulting file names tried one after the other to locate the font metrics (and outline in the case of TrueType and OpenType fonts):
.tte .ttf .otf .gai .afm .pfm .ttc .TTE .TTF .OTF .GAI .AFM .PFM .TTC
> If embedding is requested for a PostScript font, the following suffixes will be added to the font name and tried one after the other to find the font outline file:
.pfa .pfb .PFA .PFB
> All trial file names above will be searched for as is, and then by prepending all directory names configured in the SearchPath resource category. This means that PDFlib will find a font without any manual configuration provided the corresponding font file consists of the font name plus the standard file name suffix according to the font type, and is located in one of the SearchPath directories. The following groups of statements will achieve the same effect with respect to locating the font outline file:
p.set_parameter("FontOutline", "Arial=/usr/fonts/Arial.ttf"); font = p.load_font("Arial", "unicode", "");
and
p.set_parameter("SearchPath", "/usr/fonts"); font = p.load_font("Arial", "unicode", "");
112
When working with host fonts it is important to use the exact (case-sensitive) font name. Since font names are crucial we mention some platform-specific methods for determining font names below. More information on font names can be found in Section 5.2.1, PostScript Type 1 Fonts, page 106, and Section 5.2.2, TrueType and OpenType Fonts, page 107. Finding host font names on Windows. You can easily find the name of an installed font by double-clicking the font file, and taking note of the full font name which will be displayed in the first line of the resulting window. Some fonts may have parts of their name localized according to the respective Windows version in use. For example, the common font name portion Bold may appear as the translated word Fett on a German system. In order to retrieve the host font data from the Windows system you must use the translated form of the font name in PDFlib (e.g. Arial Fett), or use font style names (see below). However, in order to retrieve the font data directly from file you must use the generic (non-localized) form of the font name (e.g. Arial Bold). Note You can avoid this internationalization problem by appending font style names (e.g. ,Bold, see below) to the font name instead of using localized font name variants. If you want to examine TrueType fonts in more detail take a look at Microsofts free font properties extension1 which will display many entries of the fonts TrueType tables in human-readable form. Windows font style names. When loading host fonts from the Windows operating system PDFlib users have access to a feature provided by the Windows font selection machinery: style names can be provided for the weight and slant, for example
font = p.load_font("Verdana,Bold", "unicode", "");
This will instruct Windows to search for a particular bold, italic, or other variation of the base font. Depending on the available fonts Windows will select a font which most closely resembles the requested style (it will not create a new font variation). The font found by Windows may be different from the requested font, and the font name in the generated PDF may be different from the requested name; PDFlib does not have any control over Windows font selection. Font style names only work with host fonts, but not for fonts configured via a font file. The following keywords (separated from the font name with a comma) can be attached to the base font name to specify the font weight:
none, thin, extralight, ultralight, light, normal, regular, medium, semibold, demibold, bold, extrabold, ultrabold, heavy, black
The keywords are case-insensitive. The italic keyword can be specified alternatively or in addition to the above. If two style names are used both must be separated with a comma, for example:
font = p.load_font("Verdana,Bold,Italic", "unicode", "");
Numerical font weight values can be used as an equivalent alternative to font style names:
1. See www.microsoft.com/typography/TrueTypeProperty21.mspx
113
0 (none), 100 (thin), 200 (extralight), 300 (light), 400 (normal), 500 (medium), 600 (semibold), 700 (bold), 800 (extrabold), 900 (black)
Note Windows style names for fonts may be useful if you have to deal with localized font names since they provide a universal method to access font variations regardless of their localized names. Note Do not confuse the Windows style name convention with the fontstyle option which looks similar, but works on a completely different basis. Potential problem with host font access on Windows. Wed like to alert users to a potential problem with font installation on Windows. If you install fonts via the File, Install new font... menu item (as opposed to dragging fonts to the Fonts directory) theres a check box Copy fonts to Fonts folder. If this box is unchecked, Windows will only place a shortcut (link) to the original font file in the fonts folder. In this case the original font file must live in a directory which is accessible to the application using PDFlib. In particular, font files outside of the Windows Fonts directory may not be accessible to IIS with default security settings. Solution: either copy font files to the Fonts directory, or place the original font file in a directory where IIS has read permission. Similar problems may arise with Adobe Type Manager (ATM) if the Add without copying fonts option is checked while installing fonts. Host font names on the Mac. Using the Font Book utility, which is part of Mac OS X, you can find the names of installed host fonts. In order to programmatically create lists of host fonts we recommend Apples freely available Font Tools1. This suite of command-line utilities contains a program called ftxinstalledfonts which is useful for determining the exact names of all installed fonts. PDFlib supports several flavors of host font names: > QuickDraw font names: these are old-style font names (in Mac OS supported only by deprecated font functions) which have been in use for a long time on Mac OS systems, but are considered outdated. In order to determine QuickDraw font names issue the following command in a terminal window:
ftxinstalledfonts -q
> Unique font names: these are newer font names (in Mac OS supported by new ATS font functions) which can be encoded in Unicode, e.g. for East-Asian fonts. In order to determine unique font names issue the following command in a terminal window (in some cases the output contains entries with a : which must be removed):
ftxinstalledfonts -u
> PostScript font names. In order to determine PostScript font names issue the following command in a terminal window:
ftxinstalledfonts -p
1. See developer.apple.com/textfonts/download
114
Note The Leopard builds of PDFlib (for Mac OS X 10.5 and above) support all three kinds of host font names. Non-Leopard builds accept only QuickDraw font names. Potential problems with host font access on the Mac. In our testing we found that newly installed fonts are sometimes not accessible for UI-less applications such as PDFlib until the user logs out from the console, and logs in again. On Mac OS X 10.5 (Leopard) host fonts are not available to programs running in a terminal session from a remote computer. This is not a restriction of PDFlib, but also affects other programs such as Font Tools. This problem has been fixed in Mac OS X 10.5.6.
Alternatively, a font descriptor containing only the character metrics and some general information about the font (without the actual glyph outlines) can be embedded. If a font is not embedded in a PDF document, Acrobat will take it from the target system if available, or construct a substitute font according to the font descriptor. Table 5.1 lists different situations with respect to font usage, each of which poses different requirements on the font and metrics files required by PDFlib. In addition to the requirements listed in Table 5.1 the corresponding CMap files (plus in some cases the Unicode mapping CMap for the respective character collection, e.g. Adobe-Japan1-UCS2) must be available in order to use a (standard or custom) CJK font with any of the standard CMaps. When a font with font-specific encoding (a symbol font) or one containing glyphs outside Adobes Standard Latin character set is used, but not embedded in the PDF output, the resulting PDF will be unusable unless the font is already natively installed on the target system (since Acrobat can only simulate Latin text fonts). Such PDF files are inherently nonportable, although they may be of use in controlled environments, such as intra-corporate document exchange.
Table 5.1 Different font usage situations and required files font usage one of the 14 core fonts TrueType, OpenType, or PostScript Type 1 host font installed on the Mac or Windows system non-core PostScript fonts TrueType fonts OpenType fonts, incl. CJK TrueType and OpenType fonts and SING fonts standard CJK fonts1 font metrics file must be available? no no yes n/a n/a no font outline file must be available? only if embedding is desired no only if embedding is desired yes yes no
1. See Section 6.4, Chinese, Japanese, and Korean Text Output, page 140, for more information on CJK fonts.
115
Legal aspects of font embedding. Its important to note that mere possession of a font file may not justify embedding the font in PDF, even for holders of a legal font license. Many font vendors restrict embedding of their fonts. Some type foundries completely forbid PDF font embedding, others offer special online or embedding licenses for their fonts, while still others allow font embedding provided subsetting is applied to the font. Please check the legal implications of font embedding before attempting to embed fonts with PDFlib. PDFlib will honor embedding restrictions which may be specified in a TrueType or OpenType font. If the embedding flag in a TrueType font is set to no embedding1, PDFlib will honor the font vendors request, and reject any attempt at embedding the font.
The default value of subsetlimit is 100 percent. In other words, the subsetting option requested at PDF_load_font( ) will be honored unless the client explicitly requests a lower limit than 100 percent. > If autosubsetting=true: The subsetminsize parameter can be used to completely disable subsetting for small fonts. If the original font file is smaller than the value of subsetminsize in KB, font subsetting will be disabled for this font. > If autosubsetting=false, but subsetting is desired for a particular font nevertheless, the subsetting option must be supplied to PDF_load_font( ):
font = p.load_font("WarnockPro", "winansi", "subsetting");
Embedding and subsetting TrueType fonts. If a TrueType font is used with an encoding different from winansi and macroman it will be converted to a CID font for PDF out1. More specifically: if the fsType flag in the OS/2 table of the font has a value of 2.
116
put by default. For encodings which contain only characters from the Adobe Glyph List (AGL) this can be prevented by setting the autocidfont parameter to false. Type 3 font subsetting. Type 3 fonts must be defined and therefore embedded before they can be used in a document (because the glyph widths are required). On the other hand, subsetting is only possible after creating all pages (since the glyphs used in the document must be known to determine the proper subset). In order to avoid this conflict, PDFlib supports widths-only Type 3 fonts. If you need subsetting for a Type 3 font you must define the font in two passes: > The first pass with the widthsonly option of PDF_begin_font( ) must be done before using the font. It defines only the font and glyph metrics (widths); the font matrix in PDF_begin_font( ) as well as wx and the glyph bounding box in PDF_begin_glyph( )) must be supplied and must accurately describe the actual glyph metrics. Only PDF_ begin_glyph( ) and PDF_end_glyph( ) are required for each glyph, but not any other calls for defining the actual glyph shape. If other functions are called between start and end of a glyph description, they will not have any effect on the PDF output, and will not raise any exception. > The second pass must be done after creating all text in this font, and defines the actual glyph outlines or bitmaps. Font and glyph metrics will be ignored since they are already known from the first pass. After the last page has been created, PDFlib also knows which glyphs have been used in the document, and will only embed the required glyph descriptions to construct the font subset. The same set of glyphs must be provided in pass 1 and pass 2. A Type 3 font with subsetting can only be loaded once with PDF_load_font( ). Cookbook A full code sample can be found in the Cookbook topic fonts/type3_subsetting.
117
118
> The underline/overline/strikeout features must be used with care when working with fallback fonts, as well as the ascender and similar typographic values. The underline thickness or position defined in the base font may not match the values in the fallback font. As a result, the underline position or thickness may jump in unpleasant ways. A simple wor around against such artefacts is to specify a unified value with the underlineposition and underlinewidth options of PDF_fit_textline( ) and PDF_add/create_textflow( ). This value should be selected so that it works with the base font and all fallback fonts.
Combine fonts for use with multiple scripts. In some situations the script of incoming text data is not known in advance. For example, a database may contain Latin, Greek, and Cyrillic text, but the available fonts cover only one of these scripts at a time. Instead of determining the script and selecting an appropriate font you can construct a superfont which chains together several fonts, effectively covering the superset of all scripts. Use the following font loading option list for the fallbackfonts option to add Greek and Cyrillic fonts to a Latin font:
fallbackfonts={ {fontname=Times-Greek encoding=unicode embedding forcechars={U+0391-U+03F5}} {fontname=Times-Cyrillic encoding=unicode embedding forcechars={U+0401-U+0490}} }
Extend 8-bit encodings. If your input data is restriced to a legacy 8-bit encoding you can nevertheless use characters outside this encoding, taking advantage of fallbackfonts (where the base font itself serves as a fallback font) and PDFlibs character reference mechanism to address characters outside the encoding. Assuming you loaded the Helvetica font with encoding=iso8859-1 (this encoding does not include the Euro glyph), you can use the following font loading option list for the fallbackfonts option to add the Euro glyph to the font:
fallbackfonts={{fontname=Helvetica encoding=unicode forcechars=euro}}
Since the input encoding does not include the Euro character you cannot address it with an 8-bit value. In order to work around this restriction use character or glyph name references, e.g. &euro (see Section 4.5.2, Character References and Glyph Name References, page 96). Use Euro glyph from another font. In a slightly different scenario the base font doesnt include a Euro glyph. Use the following font loading option list for the fallbackfonts option to pull the Euro glyph from another font:
119
We used the textrise suboption to slightly move down the Euro glyph. Enlarge some or all glyphs in a font. Fallback fonts can also be used to enlarge some or all glyphs in a font without changing the font size. Again, the base font itself will be used as fallback font. This feature can be useful to make different font designs visually compatible without adjusting the fontsize in the code. Use the following font loading option list for the fallbackfonts option to enlarge all glyphs in the specified range to 120%:
fallbackfonts={ {fontname=Times-Italic encoding=unicode forcechars={U+0020-U+00FF} fontsize=120%} }
Add an enlarged pictogram. Use the following font loading option list for the fallbackfonts option to pull a symbol from the ZapfDingbats font:
fallbackfonts={ {fontname=ZapfDingbats encoding=unicode forcechars=.a12 fontsize=150% textrise=-15%} }
Again, we use the fontsize and textrise suboptions to adjust the symbols size and position to the base font. Replace glyphs in a CJK font. You can use the following font loading option list for the fallbackfonts option to replace the Latin characters in the ASCII range with those from another font:
fallbackfonts={ {fontname=Courier-Bold encoding=unicode forcechars={U+0020-U+007E}} }
Identify missing glyphs. The font Unicode BMP Fallback SIL, which is freely available, displays the hexadecimal value of each Unicode value instead of the actual glyph. This font can be very useful for diagnosing font-related problems in the workflow. You can use the following font loading option list for the fallbackfonts option to augment any font with this special fallback font to visualize missing characters:
fallbackfonts={{fontname={Unicode BMP Fallback SIL} encoding=unicode}}
Add Gaiji characters to a font. This use case is detailed in Section 6.4.3, EUDC and SING Fonts for Gaiji Characters, page 143.
120
replace
same as none same as none same as none invalid code is replaced with 0 or replacementchar
error
The API function will throw an exception if an error occurs. A detailed error message can be queried with PDF_get_errmsg( ) even if the function does not return a -1 (in PHP:0) error code.
Glyph replacement.If glyphcheck=replace, unavailable glyphs will recursively be replaced as follows: > Select a similar glyph according to the Unicode value from PDFlibs internal replacement table. The following (incomplete) list contains some of these glyph mappings. If the first character in the list is unavailable in a font, it will automatically be replaced with the second:
U+00A0 (NO-BREAK SPACE) U+0020 (SPACE)
121
(SOFT HYPHEN) (HYPHEN) (GREEK SMALL LETTER MU) (ANGSTROM SIGN) (N-ARY PRODUCT) (OHM SIGN)
(HYPHEN-MINUS) (HYPHEN-MINUS) (MICRO SIGN) (LATIN CAPITAL LETTER A WITH RING ABOVE ) (GREEK CAPITAL LETTER PI) (GREEK CAPITAL LETTER OMEGA)
In addition to the internal table, the fullwidth characters U+FF01 to U+FF5E will be replaced with the corresponding ISO 8859-1 characters (i.e. U+0021 to U+007E) if the fullwidth variants are not available in the font. > Decompose Unicode ligatures into their constituent glyphs (e.g. replace U+FB00 with U+0066 U+0066) > Select glyphs with the same Unicode semantics according to their glyph name. In particular, all glyph name suffixes separated with a period will be removed if the corresponding glyph is not available (e.g. replace A.swash with A; replace g.alt with g). If no replacement was found, the character specified in the replacementchar option will be used. If the corresponding glyph itself is not available in the font, U+00A0 (NOBREAK SPACE) and U+0020 (SPACE) will be tried; If these are still unavailable, U+0000 (missing glyph symbol) will be used. Cookbook A full code sample can be found in the Cookbook topic fonts/glyph_replacement .
122
Text fonts can be reencoded (adjusted to a certain code page or character set), while symbolic fonts cant, and must use builtin encoding instead. Nevertheless, all symbolic fonts can be used with encoding unicode in PDFlib. In this situation PDFlib assigns Unicode values from the Private Use Area (PUA). Encoding unicode for symbol fonts allows the use of character references (see Section 4.5.2, Character References and Glyph Name References, page 96) with font-specific glyph names. This is a big advantage since it allows to select symbol glyphs based on their names, without getting bogged down in Unicode/encoding problems. As an exception, the widely used Symbol and ZapfDingbats fonts have standardized Unicode values (outside of the PUA). If they are loaded with unicode encoding the glyphs can be addressed with the Unicode values U+2700 and up. Builtin encoding for TrueType fonts. TrueType fonts with non-text characters, such as the Wingdings font, can be used with builtin encoding. If a font requires builtin encoding but the client requested a different encoding, PDFlib will enforce builtin encoding nevertheless. Builtin encoding for OpenType fonts with PostScript outlines. OTF fonts with non-text characters must be used with builtin encoding. Some OTF fonts contain an internal default encoding. PDFlib will detect this case, and dynamically construct an encoding which is suited for this particular font. The encoding name builtin will be modified to builtin_<fontname> internally. Although this new encoding name can be used in future calls to PDF_load_font( ) it is only reasonable for use with the same font.
123
If the requested combination of keyword and option(s) is not available, PDF_info_font( ) will return -1. This must be checked by the client application and can be used to check whether or not a required glyph is present in a font. Each of the sample code lines below can be used in isolation since they do not depend on each other.
Query the Unicode value of an 8-bit code or a named glyph in an 8-bit encoding:
uv = (int) p.info_font(-1, "unicode", "code=" + c + " encoding=" + enc); uv = (int) p.info_font(-1, "unicode", "glyphname=" + gn + " encoding=" + enc);
124
Query the registered glyph name of an 8-bit code or a Unicode value in an 8-bit encoding:
gn_idx = (int) p.info_font(-1, "glyphname", "code=" + c + " encoding=" + enc); gn_idx = (int) p.info_font(-1, "glyphname", "unicode=" + uv + " encoding=" + enc); /* retrieve the actual glyph name using the string index */ gn = p.get_parameter("string", gn_idx);
Unicode and glyph name queries. PDF_info_font( ) can also be used to perform queries which are independent from a specific 8-bit encoding, but affect the relationship of Unicode values and glyph names known to PDFlib internally. Since these queries are independent from any font, a valid font handle is not required. Query the Unicode value of an internally known glyph name:
uv = (int) p.info_font(-1, "unicode", "glyphname=" + gn + " encoding=unicode");
Query the Unicode value for a code, glyph ID, named glyph, or CID in a Unicode-compatible font:
uv uv uv uv = = = = (int) (int) (int) (int) p.info_font(font, p.info_font(font, p.info_font(font, p.info_font(font, "unicode", "unicode", "unicode", "unicode", "code=" + c); "glyphid=" + gid); "glyphname=" + gn); "cid=" + cid);
Query the glyph id for a code, Unicode value, named glyph, or CID in a font:
gid gid gid gid = = = = (int) (int) (int) (int) p.info_font(font, p.info_font(font, p.info_font(font, p.info_font(font, "glyphid", "glyphid", "glyphid", "glyphid", "code=" + c); "unicode=" + uv); "glyphname=" + gn); "cid=" + cid);
125
Query the glyph id for a code, Unicode value, or named glyph in a font with respect to an arbitrary 8-bit encoding:
gid = (int) p.info_font(font, "glyphid", "code=" + c + " encoding" + enc); gid = (int) p.info_font(font, "glyphid", "unicode=" + uv + " encoding=" + enc); gid = (int) p.info_font(font, "glyphid", "glyphname=" + gn + " encoding=" + enc);
Query the font-specific name of a glyph specified by code, Unicode value, glyph ID, or CID:
gn_idx gn_idx gn_idx gn_idx = = = = (int) (int) (int) (int) p.info_font(font, p.info_font(font, p.info_font(font, p.info_font(font, "glyphname", "glyphname", "glyphname", "glyphname", "code=" + c); "unicode=" + uv); "glyphid=" + gid); "cid=" + cid);
/* retrieve the actual glyph name using the string index */ gn = p.get_parameter("string", gn_idx);
Checking glyph availability. Using PDF_info_font( ) you can check whether a particular font contains the glyphs you need for your application. As an example, the following code checks whether the Euro glyph is contained in a font:
/* We could also use "unicode=U+20AC" below */ if (p.info_font(font, "code", "unicode=euro") == -1) { /* no glyph for Euro sign available in the font */ }
Cookbook A full code sample can be found in the Cookbook topic fonts/glyph_availability. Alternatively, you can call PDF_info_textline( ) to check the number of unmapped characters for a given text string, i.e. the number of characters in the string for which no appropriate glyph is available in the font. The following code fragment queries results for a string containing a single Euro character (which is expressed with a glyph name reference). If one unmapped character is found this means that the font does not contain any glyph for the Euro sign:
String optlist = "font=" + font + " charref"; if (p.info_textline("€", "unmappedchars", optlist) == 1) { /* no glyph for Euro sign available in the font */ }
126
result = (int) p.info_font(font, "codepage", "name=" + cp); if (result == -1) System.err.println("Codepage coverage unknown"); else if (result == 0) System.err.println("Codepage not supported by this font"); else System.err.println("Codepage supported by this font");
Retrieving a list of all supported codepages. The following fragment queries a list of all codepages supported by a TrueType or OpenType font:
cp_idx = (int) p.info_font(font, "codepagelist", ""); if (cp_idx == -1) System.err.println("Codepage list unknown"); else { System.err.println("Codepage list:"); System.err.println(p.get_parameter("string", cp_idx)); }
This will create the following list for the common Arial font:
cp1252 cp1250 cp1251 cp1253 cp1254 cp1255 cp1256 cp1257 cp1258 cp874 cp932 cp936 cp949 cp950 cp1361
Query fallback glyphs. You can use PDF_info_font( ) to query the results of the fallback font mechanism (see Section 5.4, Fallback Fonts, page 118, for details on fallback fonts). The following fragment determines the name of the base or fallback font which is used to represent the specified Unicode character:
result = p.info_font(basefont, "fallbackfont", "unicode=U+03A3"); /* if result==basefont the base font was used, and no fallback font was required */ if (result == -1) { /* character cannot be displayed with neither base font nor fallback fonts */ } else { idx = p.info_font(result, "fontname", "api"); fontname = p.get_parameter("string", idx); }
127
128
6 Text Output
6.1 Text Output Methods
PDFlib supports text output on several levels: > Low-level text output with PDF_show( ) and similar functions; > Single-line formatted text output with PDF_fit_textline( ); > Multi-line text formatted output with Textflow (PDF_fit_textflow( ) and related functions); > Text in tables. Low-level text output. functions like PDF_show( ) can be used to place text at a specific location on the page, without using any formatting aids. This is recommended only for applications with very basic text output requirements (e.g. convert plain text files to PDF), or for applications which already have full text placement information (e.g. a driver which converts a page in another format to PDF). The following fragment creates text output with low-level functions:
font = p.load_font("Helvetica", "unicode", ""); p.setfont(font, 12); p.set_text_pos(50, 700); p.show("Hello world!"); p.continue_text("(says Java)");
Formatted single-line text output with Textlines. PDF_fit_textline( ) creates text output which consists of single lines and offers a variety of formatting features. However, the position of individual Textlines must be determined by the client application. The following fragment creates text output with a Textline. Since font, encoding, and fontsize can be specified as options, a preceding call to PDF_load_font( ) is not required:
p.fit_textline(text, x, y, "fontname=Helvetica encoding=unicode fontsize=12");
See Section 8.1, Placing and Fitting Textlines, page 171, for more information about Textlines. Multi-line text output with Textflow. PDF_fit_textflow( ) creates text output with an arbitrary number of lines and can distribute the text across multiple columns or pages. The Textflow formatter supports a wealth of formatting functions. The following fragment creates text output with Textflow:
tf = p.add_textflow(tf, text, optlist); result = p.fit_textflow(tf, llx, lly, urx, ury, optlist); p.delete_textflow(tf);
See Section 8.2, Multi-Line Textflows, page 179, for more information about Textflow. Text in tables. Textlines and Textflows can also be used to place text in table cells. See Section 8.3, Table Formatting, page 199, for more information about table features.
129
baseline descender
130
Note The position and size of superscript and subscript cannot be queried from PDFlib. Cookbook A full code sample can be found in the Cookbook topic fonts/font_metrics_info. CPI calculations. While most fonts have varying character widths, so-called monospaced fonts use the same widths for all characters. In order to relate PDF font metrics to the characters per inch (CPI) measurements often used in high-speed print environments, some calculation examples for the mono-spaced Courier font may be helpful. In Courier, all characters have a width of 600 units with respect to the full character cell of 1000 units per point (this value can be retrieved from the corresponding AFM metrics file). For example, with 12 point text all characters will have an absolute width of
12 points * 600/1000 = 7.2 points
with an optimal line spacing of 12 points. Since there are 72 points to an inch, exactly 10 characters of Courier 12 point will fit in an inch. In other words, 12 point Courier is a 10 cpi font. For 10 point text, the character width is 6 points, resulting in a 72/6 = 12 cpi font. Similarly, 8 point Courier results in 15 cpi.
6.2.2 Kerning
Some character combinations can lead to unpleasant appearance. For example, two Vs next to each other can look like a W, and the distance between T and e must be reduced in order to avoid ugly white space. This compensation is referred to as kerning. Many fonts contain comprehensive kerning tables which contain spacing adjustment values for certain critical letter pairs. There are two PDFlib controls for the kerning behavior: > By default, kerning information in a font is not read when loading a font. If kerning is desired the readkerning option must be set in PDF_load_font( ). This instructs PDFlib to read the fonts kerning data (if present in the font). > Kerning for text output must be enabled with the kerning text appearance option which is supported by the text output functions. Temporarily disabling kerning may be useful, for example, for tabular figures when the kerning data contains pairs of figures, since kerned figures wouldnt line up in a table.
No kerning
Kerning applied
131
Note that modern TrueType and OpenType fonts include special figures for this purpose which can be used with the Tabular Figures Feature and the option features={tnum}. Kerning will be applied in addition to any character spacing, word spacing, and horizontal scaling which may be active. PDFlib does not have any limit for the number of kerning pairs in a font.
The fontstyle feature should not be confused with the similar concept of Windows font style names. While fontstyle only works under the conditions above and relies on Acrobat for simulating the artificial font style, the Windows style names are entirely based on the Windows font selection engine and cannot be used to simulate non-existent styles. Cookbook A full code sample can be found in the Cookbook topic fonts/artificial_fontstyles. Simulated bold fonts. While fontstyle feature operates on a font, PDFlib supports an alternate mechanism for creating artificial bold text for individual text strings. This is controlled by the fakebold parameter or option. Cookbook A full code sample can be found in the Cookbook topic fonts/simulated_fontstyles.
132
Simulated italic fonts. As an alternative to the fontstyle feature the italicangle parameter or option can be used to simulate italic fonts when only a regular font is available. This method creates a fake italic font by skewing the regular font by a user-provided angle, and does not suffer from the fontstyle restrictions mentioned above. Negative values will slant the text clockwise. Be warned that using a real italic or oblique font will result in much more pleasing output. However, if an italic font is not available the italicangle parameter or option can be used to easily simulate one. This feature may be especially useful for CJK fonts. Typical values for the italicangle parameter or option are in the range -12 to -15 degrees. Note The italicangle parameter or option is not supported for vertical writing mode. Shadow text. PDFlib can create a shadow effect which will generate multiple instances of text where each instance is placed at a slightly different location. Shadow text can be created with the shadow option of PDF_fit_textline( ). The color of the shadow, its position relative to the main text and graphics state parameters can be specified in suboptions. Underline, overline, and strikeout text. PDFlib can be instructed to put lines below, above, or in the middle of text. The stroke width of the bar and its distance from the baseline are calculated based on the fonts metrics information. In addition, the current values of the horizontal scaling factor and the text matrix are taken into account when calculating the width of the bar. The respective parameter names for PDF_set_ parameter( ) can be used to switch the underline, overline, and strikeout feature on or off, as well as the corresponding options in the text output functions. The underlineposition and underlinewidth parameters and options can be used for fine-tuning. The current stroke color is used for drawing the bars. The current linecap and dash parameters are ignored, however. Aesthetics alert: in most fonts underlining will touch descenders, and overlining will touch diacritical marks atop ascenders. Cookbook A full code sample can be found in the Cookbook topic text_output/starter_textline. Text rendering modes. PDFlib supports several rendering modes which affect the appearance of text. This includes outline text and the ability to use text as a clipping path. Text can also be rendered invisibly which may be useful for placing text on scanned images in order to make the text accessible to searching and indexing, while at the same time assuring it will not be visible directly. The rendering modes are described in the PDFlib Reference, and can be set with the textrendering parameter or option. When stroking text, graphics state parameters such as linewidth and color will be applied to the glyph outline. The rendering mode has no effect on text displayed using a Type 3 font. Cookbook Full code samples can be found in the Cookbook topics text_output/text_as_clipping_path and text_output/invisible_text . Text color. Text will usually be display in the current fill color, which can be set using PDF_setcolor( ). However, if a rendering mode other than 0 has been selected, both stroke and fill color may affect the text depending on the selected rendering mode. Cookbook A full code sample can be found in the Cookbook topic text_output/starter_textline.
133
European Alphabetic Latin Greek Cyrillic Middle Eastern Arabic Hebrew Syriac Thaana South Asian (India) Devanagari Bengali Gurmukhi Gujarati Oriya Tamil Telugu Kannada Malayalam Southeast Asian Thai Lao Khmer East Asian Han Hiragana Katakana Hangul others
Four-character codes according to the OpenType specification also work, but are not currently supported. The full list can be found at the following location: www.microsoft.com/typography/developers/OpenType/scripttags.aspx
134
In this section we will discuss shaping for complex scripts in more detail. While most Western languages can be written by simply placing one character after the other from left to right, some writing systems (scripts) require additional processing: > The Arabic and Hebrew scripts place text from right to left. Mixed text (e.g. Arabic with a Latin insert) contains both right-to-left and left-to-right segments. These segments must be reordered, which is referred to as the Bidi (bidirectional) problem. > Some scripts, especially Arabic, use different character shapes depending on the position of the character (isolated, beginning/middle/end of a word). > Mandatory ligatures replace sequences of characters. > The position of glyphs must be adjusted horizontally and vertically. > Indic scripts require reordering of some characters, i.e. characters may change their position in the text. > Special word break and justification rules apply to some scripts. Scripts which require one or more of these processing steps are called complex scripts. The process of preparing incoming logical text for proper presentation is called shaping (this term also includes reordering and Bidi processing). The user always supplies text in unshaped form and in logical order, while PDFlib performs the necessary shaping before producing PDF output. Complex script shaping can be enabled with the shaping text option, which in turn requires the script option and optionally allows the language option. The following option list enables Arabic shaping (and Bidi processing):
shaping script=arab
Caveats. Note the following when working with complex script shaping: > PDFlib does not automatically set the shaping and script options, but expects them to be supplied by the user. > Script-specific shaping (options shaping, script, locale) will only be applied to glyphs within the same font, but not across glyphs from the base font and one or more fallback fonts if fallback fonts have been specified. > Since shaping may reorder characters in the text, care must be taken regarding attribute changes within a word. For example, if you use inline options to colorize the second character in a word what should happen when shaping swaps the first and second characters? For this reason, formatting changes should only be applied at word boundaries, but not within words. Font and text requirements. A font for use with complex script shaping must meet the following requirements in addition to containing glyphs for the target script: > It must be a TrueType or OpenType font with GDEF, GSUB, and GPOS feature tables and correct Unicode mappings appropriate for the target script. As an alternative to these OpenType tables, for the Arabic and Hebrew scripts the font may contain glyphs for the Unicode presentation forms; in this case internal tables will be used for the shaping process. For Thai text the font must contain contextual forms according to Microsoft, Apple, or Monotype Worldtype (e.g. used in some IBM products) conventions for Thai. > If standard CJK fonts are to be used, the corresponding font file must be available. > The font must be loaded with encoding=unicode or glyphid. > The monospace and vertical options of PDF_load_font( ) must not be used, and the readshaping option must not be set to false.
135
> If the fallbackfonts option of PDF_load_font( ) was used, text in a single text run must not contain glyphs from a fallback font. Script and language codes. The target script must be specified in the script text option, which supports the four-letter keywords listed in Table 6.1. Some examples:
script=arab script=hebr script=deva script={lao }
Since a font may specify language-specific differences in formatting, you can specify the language option to select the natural language in which the text is written. The language option supports three-character language tags according to the OpenType specification, see
www.microsoft.com/typography/developers/OpenType/languagetags.aspx
However, only few fonts contain language-specific script shaping tables, so in most cases specifying the script option will be sufficient, and shaping cannot be improved with the language option.
6.3.2 Shaping
The shaping process selects appropriate glyph forms depending on whether a character is located at the start, middle, or end of a character, or in a standalone position. Shaping is a crucial component of Arabic text formatting. Shaping may also replace a sequence of two or more characters with a suitable ligature. Since the shaping process determines the appropriate character forms automatically, explicit ligatures and Unicode presentation forms (e.g. Arabic Presentation Forms-A U+FB50) must not be used as input characters. Since complex scripts require multiple different glyph forms per character and additional rules for selecting and placing these glyphs, shaping for complex scripts does not work with all kinds of fonts, but requires suitable fonts which contain the necessary information. Shaping works for TrueType and OpenType fonts which contain the required feature tables (see Font and text requirements, page 135, for detailed requirements). Shaping can only be done for characters in the same font because the shaping information is specific to a particular font. As it doesnt make sense, for example, to form ligatures across different fonts, complex script shaping cannot be applied to a word which contains characters from different fonts. In some cases users may want to override the default shaping behavior. PDFlib supports several Unicode formatting characters for this purpose. For convenience, these formatting characters can also be specified with entity names (see Table 6.3).
136
Table 6.2 Formatting characters for overriding the shaping behavior formatting character U+200C U+200D entity name &ZWNJ; &ZWJ; Unicode name ZERO WIDTH NON-JOINER ZERO WIDTH JOINER function prevent the two adjacent characters from forming a cursive connection force the two adjacent characters to form a cursive connection
1. See www.unicode.org/unicode/reports/tr9/
137
The default settings of various formatting options and Acrobat behavior are targeted at left-to-right text output. Use the following options for right-to-left text formatting and document display: > Place a Textline t right-aligned with the following fitting option:
position={right center}
> Create a leader between the text and the left border:
leader={alignment=left text=.}
> Use the following option of PDF_begin/end_document( ) to activate better right-to-left document and page display in Acrobat:
viewerpreferences={direction=r2l}
> You can use the startx/starty and endx/endy keywords of PDF_info_textline( ) to determine the coordinates of the logical start and end characters, respectively.
> In script/language combinations which require specific treatment of certain punctuation characters, e.g. the and guillemet characters used as quotation marks in French text. The following Textflow option list enables advanced line breaking for
138
French text. As a result, the guillemet characters surrounding a word will not be split apart from the word at the end of a line:
<advancedlinebreak script=latn locale=fr>
Note that the locale Textflow option is somewhat different from the language text option: although the locale option can start with the same three-letter language identifier, it can optionally contain one or two additional parts. However, these will rarely be required for PDFlib.
139
sample
supported CMaps (encodings) GB-EUC-H, GB-EUC-V, GBpc-EUC-H, GBpc-EUC-V, GBK-EUC-H, GBK-EUC-V, GBKp-EUC-H, GBKp-EUCV, GBK2K-H, GBK2K-V, UniGB-UCS2-H, UniGBUCS2-V, UniGB-UTF16-H1, UniGB-UTF16-V1 B5pc-H, B5pc-V, HKscs-B5-H, HKscs-B5-V, ETen-B5H, ETen-B5-V, ETenms-B5-H, ETenms-B5-V, CNSEUC-H, CNS-EUC-V, UniCNS-UCS2-H, UniCNS-UCS2V, UniCNS-UTF16-H1, UniCNS-UTF16-V1 83pv-RKSJ-H, 90ms-RKSJ-H, 90ms-RKSJ-V, 90mspRKSJ-H, 90msp-RKSJ-V, 90pv-RKSJ-H, Add-RKSJ-H, Add-RKSJ-V, EUC-H, EUC-V, Ext-RKSJ-H, Ext-RKSJ-V, H, V, UniJIS-UCS2-H, UniJIS-UCS2-V, UniJIS-UCS2HW-H3, UniJIS-UCS2-HW-V3, UniJIS-UTF16-H1, UniJIS-UTF16-V1 KSC-EUC-H, KSC-EUC-V, KSCms-UHC-H, KSCmsUHC-V, KSCms-UHC-HW-H, KSCms-UHC-HW-V, KSCpc-EUC-H, UniKS-UCS2-H, UniKS-UCS2-V, UniKS-UTF16-H1, UniKS-UTF16-V1
Traditional Chinese
AdobeMingStd-Light2
Japanese
Korean
AdobeMyungjoStd-Medium2
1. Only available when generating PDF 1.5 or above 2. Only available when generating PDF 1.6 or above 3. The HW CMaps are not allowed for the KozMinPro-Regular-Acro and KozGoPro-Medium-Acro fonts because these fonts contain only proportional ASCII characters, but not any halfwidth forms.
Horizontal and vertical writing mode. PDFlib supports both horizontal and vertical writing modes. For standard CJK fonts and CMaps the writing mode is selected along with the encoding by choosing the appropriate CMap name. CMaps with names ending in -H select horizontal writing mode, while the -V suffix selects vertical writing mode. Fonts with encodings other than a CMap can be used for vertical writing mode by supplying the vertical option to PDF_load_font( ). Font names starting with an @ character will always be processed in vertical mode. Note Some PDFlib functions change their semantics according to the writing mode. For example, PDF_continue_text( ) should not be used in vertical writing mode, and the character spacing must be negative in order to spread characters apart in vertical writing mode.
140
Standard CJK font example. Standard CJK fonts can be selected with the PDF_load_ font( ) interface, supplying the CMap name as the encoding parameter. However, you must take into account that a given CJK font supports only a certain set of CMaps (see Table 6.4), and that Unicode-aware language bindings support only UCS2-compatible CMaps. The KozMinPro-Regular-Acro sample in Table 6.4 can been generated with the following code:
font = p.load_font("KozMinPro-Regular-Acro", "UniJIS-UCS2-H", ""); if (font == -1) { ... } p.setfont(font, 24); p.set_text_pos(50, 500); p.show("\u65E5\u672C\u8A9E");
These statements locate one of the Japanese standard fonts, choosing a Unicode-compatible CMap (UniJIS-UCS2-H) with horizontal writing mode (H). The fontname parameter must be the exact name of the font without any encoding or writing mode suffixes. The encoding parameter is the name of one of the supported CMaps (the choice depends on the font) and will also indicate the writing mode (see above). PDFlib supports all of Acrobats default CMaps, and will complain when it detects a mismatch between the requested font and the CMap. For example, PDFlib will reject a request to use a Korean font with a Japanese encoding. Forcing monospaced fonts. Some applications are not prepared to deal with proportional CJK fonts, and calculate the extent of text based on a constant glyph width and the number of glyphs. PDFlib can be instructed to force monospaced glyphs even for fonts that usually have glyphs with varying widths. Use the monospace option of PDF_ load_font( ) to specify the desired width for all glyphs. For standard CJK fonts the value 1000 will result in pleasing results:
font = p.load_font("KozMinPro-Regular-Acro", "UniJIS-UCS2-H", "monospace=1000");
141
Custom CJK font example with Japanese Shift-JIS text. The following C example uses the MS Mincho font to display some Japanese text which is supplied in Shift-JIS format according to Windows code page 932:
font = PDF_load_font(p, "MS Mincho", 0, "cp932", ""); if (font == -1) { ... } PDF_setfont(p, font, 24); PDF_set_text_pos(p, 50, 500); PDF_show2(p, "\x82\xA9\x82\xC8\x8A\xBF\x8E\x9A", 8);
Note that legacy encodings such as Shift-JIS are not supported in Unicode-aware language bindings. Custom CJK font example with Chinese Unicode text. The following example uses the ArialUnicodeMS font to display some Chinese text. The font must either be installed on the system or must be configured according to Section 5.3.1, Searching for Fonts, page 110):
font = p.load_font("Arial Unicode MS", "unicode", ""); p.setfont(font, 24); p.set_text_pos(50, 500); p.show("\u4e00\u500b\u4eba");
Accessing individual fonts in a TrueType Collection (TTC). TTC files contain multiple separate fonts. You can access each font by supplying its proper name. However, if you dont know which fonts are contained in a TTC file you can numerically address each font by appending a colon character and the number of the font within the TTC file (starting with 0). If the index is 0 it can be omitted. For example, the TTC file msgothic.ttc contains multiple fonts which can be addressed as follows in PDF_load_font( ) (each line contains equivalent font names):
msgothic:0 msgothic:1 msgothic:2 MS Gothic MS PGothic MS UI Gothic msgothic:
Note that msgothic (without any suffix) will not work as a font name since it does not uniquely identify a font. Font name aliases (see Sources of Font Data, page 110) can be used in combination with TTC indexing. If a font with the specified index cannot be found, the function call will fail. It is only required to configure the TTC font file once; all indexed fonts in the TTC file will be found automatically. The following code is sufficient to configure all indexed fonts in msgothic.ttc (see Section 5.3.1, Searching for Fonts, page 110):
p.set_parameter("FontOutline", "msgothic=msgothic.ttc");
142
Once a base font has been loaded with this fallback font configuration, the EUDC character can be used within the text without any need to change the font. Preparing EUDC fonts. You can use the EUDC editor available in Windows to create custom characters for use with PDFlib. Proceed as follows: > Use the eudcedit.exe to create one or more custom characters at the desired Unicode position(s). > Locate the EUDC.TTE file in the directory \Windows\fonts and copy it to some other directory. Since this file is invisible in Windows Explorer use the dir and copy commands in a DOS box to find the file. Now configure the font for use with PDFlib, using one of the methods discussed in (see Section 5.3.1, Searching for Fonts, page 110):
p.set_parameter("FontOutline", "EUDC=EUDC.TTE"); p.set_parameter("SearchPath", "...directory name...");
or place EUDC.TTE in the current directory. As an alternative to this explicit font file configuration you can use the following function call to configure the font file directly from the Windows directory. This way you will always access the current EUDC font used in Windows:
p.set_parameter("FontOutline", "EUDC=C:\Windows\fonts\EUDC.TTE");
> Integrate the EUDC font into any base font using the fallbackfonts option as described above. If you want to access the font directly, use the following call to load the font in PDFlib:
font = p.load_font("EUDC", "unicode", "");
as usual and supply the Unicode value(s) chosen in the first step to output the characters.
143
description full widths horizontal kana alternates hangul Hojo kanji forms (JIS X 02121990) half widths italics JIS2004 forms JIS78 forms JIS83 forms JIS90 forms alternate annotation forms NLC kanji forms proportional kana proportional figures proportional widths quarter widths ruby notation forms simplified forms traditional name forms traditional forms third widths vertical kana alternates vertical alternates and rotation
144
PDFlib supports the following groups of OpenType features: > OpenType features for Western typography listed in Table 6.6; these are controlled by the features option. > OpenType features for Chinese, Japanese, and Korean text output listed in Table 6.5; these are also controlled by the features option, and are discussed in more detail in Section 6.4.4, OpenType Features for improved CJK Text Output, page 144. > OpenType features for complex script shaping; these are automatically evaluated subject to the shaping and script options (see Section 6.3, Complex Script Output, page 134) > OpenType feature tables for kerning; however, PDFlib does not offer any controls for kerning as an OpenType feature because kerning data in a font may be represented with other means than OpenType features as well. Use the readkerning font option and the kerning text option instead to control kerning. More detailed descriptions of OpenType layout features can be found at
www.microsoft.com/typography/otspec/features_ae.htm
Tools for identifying OpenType features. You can identify OpenType feature tables with the following tools: > The FontLab font editor is a an application for creating and editing fonts. The free demo version (www.fontlab.com) displays and previews OpenType features > DTL OTMaster Light (www.fonttools.org) is a free application for viewing and analyzing fonts, including their OpenType feature tables. > Microsofts free font properties extension1 displays a list of OpenType features available in a font (see Figure 6.3). > PDFlibs PDF_info_font( ) interface can also be used to query OpenType features (see Querying OpenType features programmatically, page 149).
1. See www.microsoft.com/typography/TrueTypeProperty21.mspx
145
Table 6.6 Supported OpenType layout features for Western typography; Table 6.5 lists OpenType features for CJK text name _none afrc c2pc c2sc case dlig dnom expt frac hist hlig liga lnum mgrk numr onum ordn ornm pcap salt sinf smcp ss01-ss20 subs sups swsh titl tnum unic zero description all features disabled alternative fractions petite capitals from capitals small capitals from capitals case-sensitive forms discretionary ligatures denominators expert forms fractions historical forms historical ligatures standard ligatures lining figures mathematical Greek numerators oldstyle figures ordinals ornaments petite capitals stylistic alternates scientific inferiors small capitals stylistic set 1-20 subscript superscript swash titling tabular figures unicase slashed zero
146
147
Enabling and disabling OpenType features. You can enable and disable OpenType features for pieces of text as required. Use the features text option to enable features by supplying their name, and enable them by prepending no to the feature name. For example, with inline option lists for Textflow feature control works as follows:
<features={liga}>ffi<features={noliga}
OpenType features can also be enabled as Block properties for use with the PDFlib Personalization Server (PPS). More than one feature can be applied to the same text, but the feature tables in the font must be prepared for this situation and must contain the corresponding feature lookups in the proper order. For example, consider the word office, and the ligature (liga) and small cap (smcp) features. If both features are enabled (assuming the font contains corresponding feature entries) youd expect the small cap feature to be applied, but not the ligature feature. If this is correctly implemented in the font tables, PDFlib will generate the expected output, i.e. small caps without any ligature. Script-specific OpenType layout features. OpenType features may apply in all situations, or can be implemented for a particular script or even a particular script and language combination. For this reason the script and language text options can optionally be supplied along with the features option. They will have a noticeable effect only if the feature is implement in a script- or language-specific manner in the font. As an example, the ligature for the f and i glyphs is not available in some fonts if the Turkish language is selected (since the ligated form of i could be confused with the dotless i which is very common in Turkish). Using such a font the following Textflow option will create a ligature since no script/language is specified:
<features={liga}>fi
However, the following Textflow option list will not create any ligature due to the Turkish language option:
<script=latn language=TRK features={liga}>fi
See Script and language codes, page 136, for details on supported script and language keywords. Combining OpenType features and shaping. Shaping for complex scripts (see Section 6.3, Complex Script Output, page 134) heavily relies on OpenType font features which will be selected automatically. However, for some fonts it may make sense to combine OpenType features selected automatically for shaping with OpenType features which have been selected by the client application. Note the following with respect to this combination: > PDFlib will first apply user-selected OpenType features before applying shaping-related features automatically. > In Textlines the script-specific processing of OpenType features may produce unexpected results. For example, Latin ligatures dont work in combination with Arabic
148
text within the same Textline. The reason is that the script option can be supplied only once for the contents of a Textline and affects both the shaping and feature options:
shaping script=arab features={liga} WRONG, will not work with common fonts!
However, Arabic fonts typically dont contain Latin ligatures with an Arabic script designation, but only for the default or Latin script but the script option cannot be changed within a single Textline. Because of this PDFlib will not find any ligatures and will emit plain characters instead. Querying OpenType features programmatically. You can query OpenType features in a font programmatically with PDF_info_font( ). The following statement retrieves a space-separated list with all OpenType features which are available in the font and are supported by PDFlib:
result = (int) p.info_font(font, "featurelist", ""); if (result != -1) { /* retrieve string containing space-separated feature list */ featurelist = p.get_parameter("string", result); } else { /* no supported features found */ }
Use the following statement to check whether PDFlib and the test font support a particular feature, e.g. ligatures (liga):
result = (int) p.info_font(font, "feature", "name=liga"); if (result == 1) { /* feature supported by font and PDFlib */ }
149
150
The last parameter of PDF_fit_image( ) function is an option list which supports a variety of options for positioning, scaling, and rotating the image. Details regarding these options are discussed in Section 7.3, Placing Images and imported PDF Pages, page 164. Cookbook A full code sample can be found in the Cookbook topic images/starter_image. Re-using image data. PDFlib supports an important PDF optimization technique for using repeated raster images. Consider a layout with a constant logo or background on multiple pages. In this situation it is possible to include the actual image data only once in the PDF, and generate only a reference on each of the pages where the image is used. Simply load the image file once, and call PDF_fit_image( ) every time you want to place the logo or background on a particular page. You can place the image on multiple pages, or use different scaling factors for different occurrences of the same image (as long as the image hasnt been closed). Depending on the images size and the number of occurrences, this technique can result in enormous space savings. Scaling and dpi calculations. PDFlib never changes the number of pixels in an imported image. Scaling either blows up or shrinks image pixels, but doesnt do any downsampling (the number of pixels in an image will always remain the same). A scaling factor of 1 results in a pixel size of 1 unit in user coordinates. In other words, the image will be imported with its native resolution (or 72 dpi if it doesnt contain any resolution informa-
151
tion) if the user coordinate system hasnt been scaled (since there are 72 default units to an inch). Cookbook A full code sample can be found in the Cookbook topic images/image_dimensions. It shows how to get the dimensions of an image and how to place it with various sizes. Color space of imported images. Except for adding or removing ICC profiles and applying a spot color according to the options provided in PDF_load_image( ), PDFlib will generally try to preserve the native color space of an imported image. However, this is not possible for certain rare combinations, such as YCbCr in TIFF which will be converted to RGB. PDFlib does not perform any conversion between RGB and CMYK. If such a conversion is required it must be applied to the image data before loading the image in PDFlib. Multi-page images. PDFlib supports GIF, TIFF and JBIG2 images with more than one image, also known as multi-page image files. In order to use multi-page images use the page option in PDF_load_image( ):
image = p.load_image("tiff", filename, "page=2");
The page option indicates that a multi-image file is to be used, and specifies the number of the image to use. The first image is numbered 1. This option may be increased until PDF_load_image( ) returns -1, signalling that no more images are available in the file. Cookbook A full code sample for converting all images in a multi-image TIFF file to a multi-page PDF file can be found in the Cookbook topic images/multi_page_tiff. Inline images. As opposed to reusable images, which are written to the PDF output as image XObjects, inline images are written directly into the respective content stream (page, pattern, template, or glyph description). This results in some space savings, but should only be used for small amounts of image data (up to 4 KB) per a recommendation in the PDF reference. The primary use of inline images is for bitmap glyph descriptions in Type 3 fonts. Inline images can be generated with the PDF_load_image( ) interface by supplying the inline option. Inline images cannot be reused, i.e., the corresponding handle must not be supplied to any call which accepts image handles. For this reason if the inline option has been provided PDF_load_image( ) internally performs the equivalent of the following code:
p.fit_image(image, 0, 0, ""); p.close_image(image);
Inline images are only supported for imagetype=ccitt, jpeg, and raw. For other image types the inline option will silently be ignored. OPI support. When loading an image additional information according to OPI (Open Prepress Interface) version 1.3 or 2.0 can be supplied in the call to PDF_load_image( ). PDFlib accepts all standard OPI 1.3 or 2.0 PostScript comments as options (not the corresponding PDF keywords!), and will pass through the supplied OPI information to the generated PDF output without any modification. The following example attaches OPI information to an image:
152
String
optlist13 = "OPI-1.3 { ALDImageFilename bigfile.tif " + "ALDImageDimensions {400 561} " + "ALDImageCropRect {10 10 390 550} " + "ALDImagePosition {10 10 10 540 390 540 390 10} }";
Note Some OPI servers, such as the one included in Helios EtherShare, do not properly implement OPI processing for PDF Image XObjects, which PDFlib generates by default. In such cases generation of Form XObjects can be forced by supplying the template option to PDF_load_image( ). XMP metadata in images. Image files may contain XMP metadata. PDFlib will honor image metadata for images in the TIFF, JPEG, and JPEG 2000 formats by default. The XMP metadata will be attached to the generated image in the output PDF document. While keeping image metadata is usually recommended, it can be disabled to reduce the output file size. Use the following option of PDF_load_image( ) to drop XMP metadata which may be present in an imported image:
metadata={keepxmp=false}
Since invalid XMP will prevent an image from being loaded the option list above can also be used to work around invalid XMP.
153
> Progressive JPEG compression. JPEG images can be packaged in several different file formats. PDFlib supports all common JPEG file formats, and will read resolution information from the following flavors: > JFIF, which is generated by a wide variety of imaging applications. > JPEG files written by Adobe Photoshop and other Adobe applications. PDFlib applies a workaround which is necessary to correctly process Photoshop-generated CMYK JPEG files. PDFlib will also read clipping paths from JPEG images created with Adobe Photoshop. Note PDFlib does not interpret color space or resolution information from JPEG images in the SPIFF or Exif formats. JPEG 2000 images. JPEG 2000 images (ISO 15444-2) require PDF 1.5 or above, and are always handled in pass-through mode. PDFlib supports JPEG 2000 images as follows: > JP2 and JPX baseline images (usually *.jp2 or *.jpf) are supported, subject to the color space conditions below. All valid color depth values are supported. The following color spaces are supported: sRGB, sRGB-grey, ROMM-RGB, sYCC, e-sRGB, e-sYCC, CIELab, ICC-based color spaces (restricted and full ICC profile), and CMYK. PDFlib will not alter the original color space in the JPEG 2000 image file. > Raw JPEG 2000 code streams without JPX wrapper (often *.j2k) with 1, 3, or 4 color components are supported. > Images containing a soft mask can be used with the mask option to prepare a mask which can be applied to other images. > External ICC profiles can not be applied to a JPEG 2000 image, i.e. the iccprofile option must not be used. ICC profiles contained in the JPEG 2000 image file will always be kept, i.e. the honoriccprofile option is always true. Note JPM compound image files according to ISO 15444-6 (usually *.jpm) are not supported. JBIG2 images. PDFlib supports single- and multi-page flavors of JBIG2 images (ISO 14492). JBIG2 images always contain black/white pixel data and require PDF 1.4 or above. Due to the nature of JBIG2 compression, several pages in a multi-page JBIG2 stream may refer to the same global segments. If more than one page of a multi-page JBIG2 stream is converted the global segments can be shared among the generated PDF images. Since the calls to PDF_load_image( ) are independent from each other you must inform PDFlib in advance that multiple pages from the same JBIG2 stream will be converted. This works as follows: > When loading the first page all global segments are copied to the PDF. Use the following option list for PDF_load_image( ):
page=1 copyglobals=all
> When loading subsequent pages from the same JBIG2 stream the image handle<N> for page 1 must be provided so that PDFlib can create references to the global segments which have been copied with page 1. Use the following option list for PDF_ load_image( ):
page=2 imagehandle=<N>
154
The client application must make sure that the copyglobals/imagehandle mechanism is only applied to multiple pages which are extracted from the same JBIG2 image stream. Without the copyglobals options PDFlib will automatically copy all required data for the current page. GIF images. PDFlib supports all GIF flavors (specifically GIF 87a and 89a) with interlaced and non-interlaced pixel data and all palette sizes. GIF images will always be recompressed with Flate compression. TIFF images. PDFlib imports most TIFF images. PDFlib supports the following flavors of TIFF images: > compression schemes: uncompressed, CCITT (group 3, group 4, and RLE), ZIP (=Flate), and PackBits (=RunLength) are handled in pass-through mode; other compression schemes, such as LZW and JPEG, are handled by uncompressing. > color: black and white, grayscale, RGB, CMYK, CIELab, and YCbCr images; > TIFF files containing more than one image (see Section , Multi-page images, page 152) > Color depth must be 1, 2, 4, 8, or 16 bits per color sample. In PDF 1.5 mode 16 bit color depth will be retained in most cases with pass-through mode, but reduced to 8 bit for certain image files (ZIP compression with little-endian/Intel byte order and 16-bit palette images). PDFlib fully interprets the orientation tag which specifies the desired image orientation in some TIFF files. PDFlib can be instructed to ignore the orientation tag (as many applications do) by setting the ignoreorientation option to true. PDFlib honors clipping paths in TIFF images created with Adobe Photoshop and compatible programs unless the ignoreclippingpath option is set. Some TIFF features (e.g., spot color) and certain combinations of features (e.g., CMYK images with a mask) are not supported. Although TIFF images with JPEG compression are generally supported, some flavors of so-called old-style TIFF-JPEG will be rejected. BMP images. BMP images cannot be handled in pass-through mode. PDFlib supports the following flavors of BMP images: > BMP versions 2 and 3; > color depth 1, 4, and 8 bits per component, including 3 x 8 = 24 bit TrueColor. 16-bit images will be treated as 5+5+5 plus 1 unused bit. 32-bit images will be treated as 3 x 8 bit images (the remaining 8 bits will be ignored). > black and white or RGB color (indexed and direct); > uncompressed as well as 4-bit and 8-bit RLE compression; > PDFlib will not mirror images if the pixels are stored in bottom-up order (this is a rarely used feature in BMP which is interpreted differently in applications). CCITT images. Group 3 or Group 4 fax compressed image data are always handled in pass-through mode. Note that this format actually means raw CCITT-compressed image data, not TIFF files using CCITT compression. Raw CCITT compressed image files are usually not supported in end-user applications, but can only be generated with fax-related software. Since PDFlib is unable to analyze CCITT images, all relevant image parameters have to be passed to PDF_load_image( ) by the client.
155
Raw data. Uncompressed (raw) image data may be useful for some special applications. The nature of the image is deduced from the number of color components: 1 component implies a grayscale image, 3 components an RGB image, and 4 components a CMYK image.
156
> GIF image files may contain a single transparent color value (palette entry) which is respected by PDFlib. > TIFF images may contain a single associated alpha channel which will be honored by PDFlib. Alternatively, a TIFF image may contain an arbitrary number of unassociated channels which are identified by name. These channels may be used to convey transparency or other information. When unassociated channels are found in a TIFF image PDFlib will by default use the first channel as alpha channel. However, you can explicitly select an unassociated alpha channel by supplying its name:
image = p.load_image("tiff", filename, "alphachannelname={apple}");
> PNG images may contain an associated alpha channel which will automatically be used by PDFlib. > As an alternative to a full alpha channel, PNG images may contain single transparent color values which will be honored by PDFlib. If multiple color values with an attached alpha value are given, only the first one with an alpha value below 50 percent is used. Sometimes it is desirable to ignore any implicit transparency which may be contained in an image file. PDFlibs transparency support can be disabled with the ignoremask option when loading the image:
image = p.load_image("tiff", filename, "ignoremask");
Explicit transparency. The explicit case requires two steps, both of which involve image operations. First, a grayscale image must be prepared for later use as a mask. This is accomplished by loading the image with the mask option. In PDF 1.3, which supports only 1-bit masks, using this option is required; in PDF 1.4 and above it is optional. The following kinds of images can be used for constructing a mask: > PNG images > TIFF images: the nopassthrough option for PDF_load_image( ) is recommended to avoid multi-strip images. > raw image data Pixel values of 0 (zero) in the mask will result in the corresponding area of the masked image being painted, while high pixel values result in the background shining through. If the pixel has more than 1 bit per pixel, intermediate values will blend the foreground image against the background, providing a transparency effect. In the second step the mask is applied to another image:
mask = p.load_image("png", maskfilename, "mask"); if (mask == -1) throw new Exception("Error: " + p.get_errmsg()); String optlist = "masked=" + mask; image = p.load_image(type, filename, optlist) if (image == -1) throw new Exception("Error: " + p.get_errmsg()); p.fit_image(image, x, y, "");
Note the different use of the option list for PDF_load_image( ): mask for defining a mask, and masked for applying a mask to another image.
157
The image and the mask may have different pixel dimensions; the mask will automatically be scaled to the images size. Note PDFlib converts multi-strip TIFF images to multiple PDF images, which would be masked individually. Since this is usually not intended, this kind of images will be rejected both as a mask as well as a masked target image. Also, it is important to not mix the implicit and explicit cases, i.e., dont use images with transparent color values as mask. Note The mask must have the same orientation as the underlying image; otherwise it will be rejected. Since the orientation depends on the image file format and other factors it is difficult to detect. For this reason it is recommended to use the same file format and creation software for both mask and image. Cookbook A full code sample can be found in the Cookbook topic images/image_mask. Image masks and soft masks. Image masks are images with a bit depth of 1 (bitmaps) in which zero bits are treated as transparent: whatever contents already exist on the page will shine through the transparent parts of the image. 1-bit pixels are colorized with the current fill color. Soft masks (PDF 1.4 and above) generalize the concept of image masks to masks with more than 1 bit. They blend the image against some existing background. PDFlib accepts all kinds of single-channel (grayscale) images as soft mask. They can be used the same way as image masks. The following kinds of images can be used as image masks: > PNG images > TIFF images (single- or multi-strip) > JPEG images (only as soft mask, see below) > BMP; note that BMP images are oriented differently than other image types. For this reason BMP images must be mirrored along the x axis before they can be used as a mask. > raw image data Image masks are simply opened with the mask option, and placed on the page after the desired fill color has been set:
mask = p.load_image("tiff", maskfilename, "mask"); p.setcolor("fill", "rgb", 1.0, 0.0, 0.0, 0.0); if (mask != -1) { p.fit_image(mask, x, y, ""); }
If you want to apply a color to an image without the zero bit pixels being transparent you must use the colorize option (see Section 7.1.5, Colorizing Images, page 158).
158
In order to colorize an image with a spot color you must supply the colorize option when loading the image, and supply the respective spot color handle which must have been retrieved with PDF_makespotcolor( ):
p.setcolor("fillstroke", "cmyk", 1, .79, 0, 0); spot = p.makespotcolor("PANTONE Reflex Blue CV"); String optlist = "colorize=" + spot; image = p.load_image("tiff", "image.tif", optlist); if (image != -1) { p.fit_image(image, x, y, ""); }
159
160
You can not re-use individual elements of imported pages with other PDFlib functions. For example, re-using fonts from imported documents for some other content is not possible. Instead, all required fonts must be configured in PDFlib. If multiple imported documents contain embedded font data for the same font, PDI will not remove any duplicate font data. On the other hand, if fonts are missing from some imported PDF, they will also be missing from the generated PDF output file. As an optimization you should keep the imported document open as long as possible in order to avoid the same fonts to be embedded multiple times in the output document. PDI does not change the color of imported PDF documents in any way. For example, if a PDF contains ICC color profiles these will be retained in the output document. PDFlib uses the template feature for placing imported PDF pages on the output page. Since some third-party PDF software does not correctly support the templates, restrictions in certain environments other than Acrobat may apply (see Section 3.2.4, Templates, page 64). PDFlib-generated output which contains imported pages from other PDF documents can be processed with PDFlib+PDI again. However, due to restrictions in PostScript printing the nesting level should not exceed 10. Code fragments for importing PDF pages. Dealing with pages from existing PDF documents is possible with a very simple code structure. The following code snippet opens a page from an existing document, and copies the page contents to a new page in the output PDF document (which must have been opened before):
int String doc, page, pageno = 1; filename = "input.pdf";
if (p.begin_document(outfilename, "") == -1) {...} ... doc = p.open_pdi_document(infilename, ""); if (doc == -1) throw new Exception("Error: " + p.get_errmsg()); page = p.open_pdi_page(doc, pageno, ""); if (page == -1) throw new Exception("Error: " + p.get_errmsg()); /* dummy page size, will be modified by the adjustpage option */ p.begin_page_ext(20, 20, ""); p.fit_pdi_page(page, 0, 0, "adjustpage"); p.close_pdi_page(page); ...add more content to the page using PDFlib functions... p.end_page_ext(""); p.close_pdi_document(doc);
The last parameter to PDF_fit_pdi_page( ) is an option list which supports a variety of options for positioning, scaling, and rotating the imported page. Details regarding these options are discussed in Section 7.3, Placing Images and imported PDF Pages, page 164. Dimensions of imported PDF pages. Imported PDF pages are regarded similarly to imported raster images, and can be placed on the output page using PDF_fit_pdi_page( ). By default, PDI will import the page exactly as it is displayed in Acrobat, in particular:
161
> cropping will be retained (in technical terms: if a CropBox is present, PDI favors the CropBox over the MediaBox; see Section 3.2.2, Page Size, page 61); > rotation which has been applied to the page will be retained. Alternatively, you can use the pdiusebox option to explicitly instruct PDI to use any of the MediaBox, CropBox, BleedBox, TrimBox or ArtBox entries of a page (if present) for determining the size of the imported page. Imported PDF pages with layers. Acrobat 6 (PDF 1.5) introduced the layer functionality (technically known as optional content). PDI will ignore any layer information which may be present in a file. All layers in the imported page, including invisible layers, will be visible in the generated output. Importing GeoPDF with PDI. When importing GeoPDF with PDI the geospatial information will be kept if it has been created with one of the following methods (imagebased geospatial reference): > with PDFlib and the georeference option of PDF_load_image( ) > by importing an image with geospatial information in Acrobat. The geospatial information will be lost after importing a page if it has been created with one of the following methods (page-based geospatial reference): > with PDFlib and the viewports option of PDF_begin/end_page_ext( ) > by manually geo-registering a PDF page in Acrobat. Imported PDF with OPI information. retained in the output unmodified. OPI information present in the input PDF will be
Optimization across multiple imported documents. While PDFlib itself creates highly optimized PDF output, imported PDF may contain redundant data structures which can be optimized. In addition, importing multiple PDFs may bloat the output file size when multiple files contain identical resources, e.g. fonts. In this situation you can use the optimize option of PDF_begin_document( ). It will detect redundant objects in imported files, and remove them without affecting the visual appearance or quality of the generated output.
162
The following kinds of PDF documents will be rejected by default; however, they can be opened for querying information with pCOS (as opposed to importing pages) by setting the infomode option to true: > PDF documents which use a higher PDF version number than the PDF output document that is currently being generated can not be imported with PDI. The reason is that PDFlib can no longer make sure that the output will actually conform to the requested PDF version after a PDF with a higher version number has been imported. Solution: set the version of the output PDF to the required level using the compatibility option in PDF_begin_document( ). As an exception to the rule input PDF version must not exceed output PDF version PDI will accept PDF 1.7ext 3 (i.e. Acrobat 9) documents even when generating PDF 1.7 output. > Encrypted PDF documents without the corresponding password (exception: PDF 1.6 documents created with the Distiller setting Object Level Compression: Maximum; these cannot be opened even in info mode). > Tagged PDF when the tagged option in PDF_begin_document( ) is true. > PDF/A or PDF/X documents which are incompatible to the PDF/A or PDF/X level of the current output document.
163
Similarly, you can use the position option with another combination of the keywords left, right, center, top, and bottom to place the object at the reference point. Placing an image with scaling. modifying its size: The following variation will place the image while
p.fit_image(image, 0, 0, "scale=0.5");
This code fragment places the object with its lower left corner at the point (0, 0) in the user coordinate system. In addition, the object will be scaled in x and y direction by a scaling factor of 0.5, which makes it appear at 50 percent of its original size. Cookbook A full code sample can be found in the Cookbook topic images/starter_image.
164
Positioning an image in the box. We define a box and place an image within the box on the top right. The box has a width of 70 units and a height of 45 units and is placed at the reference point (0, 0). The image is placed on the top right of the box (see Figure 7.2a). Similarly, we can place the image at the center of the bottom. This case is depicted in Figure 7.2b.
Fig. 7.2 Placing an image in a box subject to various positioning options Generated output Option list for PDF_fit_image
a)
b)
165
Fig. 7.3 Fitting an image into a box subject to various fit methods Generated output Option list for PDF_fit_image( )
a)
b)
c)
d)
e)
f)
g)
Adjusting an object to the page. Adjusting an object to a given page size can easily be accomplished by choosing the page as target box for placing the object. The following statement uses an A4-sized page with dimensions 595 x 842:
p.fit_image(image, 0, 0, "boxsize={595 842} position={left bottom} fitmethod=slice");
In this code fragment a box is placed at the lower left corner of the page. The size of the box equals the size of an A4 page. The object is placed in the lower left corner of the box and scaled proportionally until it fully covers the box and therefore the page. If the object exceeds the box it will be cropped. Note that fitmethod=slice results in the object being scaled (as opposed to fitmethod=clip which doesnt scale the object). Of course the position and fitmethod options could also be varied in this example.
166
not specified any fit method the image will be output in its original size and will exceed the box. Fitting an image proportionally into a box with orientation. Our next goal is to orientate the image to the west with a predefined size. We define a box of the desired size and fit the image into the box with the images proportions being unchanged (fitmethod=meet). The orientation is specified as orientate=west. By default, the image will be placed in the lower left corner of the box (see Figure 7.6b). Figure 7.6c shows the image orientated to the east, and Figure 7.6d the orientation to the south. The orientate option supports the direction keywords north, east, west, and south as demonstrated in Figure 7.5. Note that the orientate option has no influence on the whole coordinate system but only on the placed object.
Fig. 7.6 Orientating an image Generated output Option list for PDF_fit_image( )
a)
b)
c)
d)
e)
167
Fitting an oriented image into a box with clipping. We orientate the image to the east (orientate=east) and position it centered at the bottom of the box (position={center bottom}). In addition, we place the image in its original size and clip it if it exceeds the box (fitmethod=clip) (see Figure 7.6e).
a) (x, y)
b) (x, y)
c) (x, y)
d) (x, y)
168
Fig. 7.8 Adjusting the page size. Left to right: exact, enlarge, shrink
The next code fragment increases the page size by 40 units in x and y direction, creating a white border around the object:
p.fit_image(image, 40, 40, "adjustpage");
The next code fragment decreases the page size by 40 units in x and y direction. The object will be clipped at the page borders, and some area within the object (with a width of 40 units) will be invisible:
p.fit_image(image, -40, -40, "adjustpage");
In addition to placing by means of x and y coordinates (which specify the objects distance from the page edges, or the coordinate axes in the general case) you can also specify a target box. This is a rectangular area in which the object will be placed subject to various formatting rules. These can be controlled with the boxsize, fitmethod and position options. Cloning the page boxes of an imported PDF page. You can copy all relevant page boxes (MediaBox, CropBox) etc. of an imported PDF page to the current output page. The cloneboxes option must be supplied to PDF_open_pdi_page( ) to read all relevant box values, and again in PDF_fit_pdi_page( ) to apply the box values to the current page:
/* Open the page and clone the page box entries */ inpage = p.open_pdi_page(indoc, 1, "cloneboxes"); ... /* Start the output page with a dummy page size */ p.begin_page_ext(10, 10, ""); ... /* * Place the imported page on the output page, and clone all * page boxes which are present in the input page; this will * override the dummy size used in begin_page_ext(). */ p.fit_pdi_page(inpage, 0, 0, "cloneboxes");
169
Using this technique you can make sure that the pages in the generated PDF will have the exact same page size, cropping are etc. as the pages of the imported document. This is especially important for prepress applications.
Information about placed PDF pages. The PDF_info_pdi_page( ) function can be used to query information about placed PDF pages. The supported keywords for this function cover information about the original page (e.g. its width and height) as well as geometry information related to placing the imported PDF on the output page (e.g. width and height after performing the fitting calculations). The following code fragment retrieves both the original size of the imported page and the size after placing the page with certain fitting options:
String optlist = "boxsize={400 500} fitmethod=meet"; p.fit_pdi_page(page, 0, 0, optlist); pagewidth = p.info_pdi_page(page, "pagewidth", optlist); pageheight = p.info_pdi_page(page, "pageheight", optlist); System.err.println("original page size: " + pagewidth + " x " + pageheight); width = p.info_pdi_page(page, "width", optlist); height = p.info_pdi_page(page, "height", optlist); System.err.println("size of placed page: " + width + " x " + height);
170
Figure 8.1 illustrates centered text placement. Similarly, you can use the position option with another combination of the keywords left, right, center, top, and bottom to place text at the reference point.
Kraxi
x
Kraxi
171
Kraxi
20
Placing text with orientation. Our next goal is to rotate text while placing its lower left corner (after the rotation) at the reference point. The following code fragment orientates the text to the west (90 counterclockwise) and then translates the lower left corner of the rotated text to the reference point (0, 0).
p.fit_textline(text, 0, 0, "orientate=west");
Kraxi
b)
c)
d)
172
Generated output
e)
Kraxi
Aligning text at a horizontal or vertical line. Positioning text along a horizontal or vertical line (i.e. a box with zero height or width) is a somewhat extreme case which may be useful nevertheless. In Figure 8.4d the text is placed with the bottom centered at the box. With a width of 50 and a height of 0, the box resembles to a horizontal line. To align the text centered along a vertical line we will orientate it to the west and position it at the left center of the box. This case is shown in Figure 8.4e.
173
can be used. In Figure 8.5h the text is placed at the bottom left of a box which is not broad enough. The text will be clipped on the right.
Fig. 8.5 Fitting text into a box on the page subject to various options Generated output Option list for PDF_fit_textline( )
a)
Kraxi Systems
b)
Kraxi Systems
c)
Kraxi Systems
d)
Kraxi Systems
e)
Kraxi Systems
f)
Kraxi Systems
g)
Kraxi Systems
Kraxi Sys
h)
Vertically centering text. The text height in PDF_fit_textline( ) is the capheight, i.e. the height of the capital letter H, by default. If the text is positioned in the center of a box it will be vertically centered according to its capheight (see Figure 8.6a). To specify another height for the text line we can use the Matchbox feature (see also Section 8.4, Matchboxes, page 214). The matchbox option of PDF_fit_textline( ) define the height of a Textline which is the capheight of the given font size, by default. The height of the matchbox is calculated according to its boxheight suboption. The boxheight suboption determines the extent of the text above and below the baseline. matchbox={boxheight={capheight none}} is the default setting, i.e. the top border of the matchbox will touch the capheight above the baseline, and the bottom border of the matchbox will not extend below the baseline.
174
To illustrate the size of the matchbox we will fill it with red color (see Figure 8.6b). Figure 8.6c vertically centers the text according to the xheight by defining a matchbox with a corresponding box height. Figure 8.6df shows the matchbox (red) with various useful boxheight settings to determine the height of the text to be centered in the box (blue).
Fig. 8.6 Fitting text proportionally into a box according to different box heights Generated output a) Option list for PDF_fit_textline( ) boxsize={80 20} position=center fitmethod=auto boxsize={80 20} position=center fitmethod=auto matchbox={boxheight={capheight none} fillcolor={rgb 1 0.8 0.8}} boxsize={80 20} position=center fitmethod=auto matchbox={boxheight={xheight none} fillcolor={rgb 1 0.8 0.8}} boxsize={80 20} position=center fitmethod=auto matchbox={boxheight={ascender none} fillcolor={rgb 1 0.8 0.8}} boxsize={80 20} position=center fitmethod=auto matchbox={boxheight={ascender descender} fillcolor={rgb 1 0.8 0.8}} boxsize={80 20} position=center fitmethod=auto matchbox={boxheight={fontsize none} fillcolor={rgb 1 0.8 0.8}}
Kraxi Systems Kraxi Systems Kraxi Systems Kraxi Systems Kraxi Systems Kraxi Systems
b)
c)
d)
e)
f)
a)
b)
175
n Giant Wi
8.1.6 Using Leaders
Leaders can be used to fill the space between the borders of the fitbox and the text. For example, dot leaders are often used as a visual aid between the entries in a table of contents and the corresponding page numbers. Leaders in a table of contents. Using PDF_fit_textline( ) with the leader option and the alignment={none right} suboption, leaders are appended to the right of the text line, and repeated until the right border of the text box. There will be an equal distance between the rightmost leader and the right border, while the distance between the text and the leftmost leader may differ (see Figure 8.9a). Cookbook A full code sample demonstrating the usage of dot leaders in a text line can be found in the Cookbook topic text_output/leaders_in_textline. Cookbook A full code sample demonstrating the usage of dot leaders in a Textflow can be found in the Cookbook topic text_output/dot_leaders_with_tabs. Leaders in a news ticker. In another use case you might want to create a news ticker effect. In this case we use a plus and a space character + as leaders. The text line is placed in the center, and the leaders are printed before and after the text line (alignment={left right}). The left and right leaders are aligned to the left and right border, and might have a varying distance to the text (see Figure 8.9b).
176
a)
b)
++
Cookbook A full code sample can be found in the Cookbook topic text_output/text_on_a_path.
lid eG nc
na
Dis t
l ra
Long
nge !
8.1 Placing and Fitting Textlines 177
Using an image clipping path for placing text. As an alternative to manually constructing a path object with the path functions you can extract the clipping path from an image and place text on the resulting path. The image must have been loaded with the honorclippingpath option, and the clippingpathname option must also be supplied to PDF_load_image( ) if the target path is not the images default clipping path:
image = p.load_image("auto", "image.tif", "clippingpathname={path 1}"); /* create a path object from the images clipping path */ path = (int) p.info_image(image, "clippingpath", ""); if (path == -1) throw new Exception("Error: clipping path not found!"); /* Place text on the path */ p.fit_textline("Long Distance Glider with sensational range!", x, y, "textpath={path=" + path + "} position={center bottom}");
Creating a gap between path and text. By default, PDFlib will place individual characters on the path, which means that there will no space between the glyphs and the path. If you want to create a gap between path and text you can simply increase the character boxes. This can easily be achieved with boxheight suboption of the matchbox option which specifies the vertical extension of the character boxes. The following option list takes the descenders into account (see Figure 8.10):
p.fit_textline("Long Distance Glider with sensational range!", x, y, "textpath={path=" + path + "} position={center bottom} " + "matchbox={boxheight={capheight descender}}");
ta
nc
G li d e
e r w it h s e n
sa
ti o
na
Fig. 8.11 Text on a path with an additional gap between text and path
Di s
l ra
Long
nge!
178
[email protected] www.kraxi.com
John Q. Doe 255 Customer Lane Suite B 12345 User Town Everland INVOICE
14.03.2004
ruler right 30
ITEM 1 2 3 4 5 6 7
left 45
DESCRIPTION Super Kite Turbo Flyer Giga Trash Bare Bone Kit Nitty Gritty Pretty Dark Flyer Free Gift
right 275
QUANTITY 2 5 1 3 10 1 1
right 375
PRICE 20,00 40,00 180,00 50,00 20,00 75,00 0,00
right 475
AMOUNT 40,00 200,00 180,00 150,00 200,00 75,00 0,00 845,00
leftindent = 55
Terms of payment: 30 days net. 30 days warranty starting at the day of sale. This warranty covers defects in workmanship only. Kraxi Systems, Inc., at its option, repairs or replaces the product under warranty. This warranty is not transferable. Returns or exchanges are not possible for wet products. Have a look at our new paper plane models! Our paper planes are the ideal way of passing the time. We offer revolutionary new developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet. They are exclusively folded without using any adhesive. Several models are equipped with a folded landing gear enabling a safe landing on the intended location provided that you have aimed well. Other models are able to fly loops or cover long distances. Let them start from a vista point in the mountains and see where they touch the ground. 1. Long Distance Glider With this paper rocket you can send all your messages even when sitting in a hall or in the cinema pretty near the back. Giant Wing An unbelievable sailplane! It is amazingly robust and can even do
alignment = left
parindent = 7%
alignment = justify
rightindent = 60
2.
minlinecount =2
179
C one H e a d R oc ke t This paper arrow can be thrown with big swing. We launched it from the roof of a hotel. It stayed in the air a long time and covered a considerable distance.
Super Dart The super dart can fly giant loops with a radius of 4 or 5 meters and cover very long distances. Its heavy cone point is slightly bowed upwards to get the lift required for loops. German Bi-Plane Brand-new and ready for take-off. If you have lessons in the history of aviation you can show your interest by letting it land on your teacher's desk.
4.
5.
A multi-line Textflow can be placed into one or more rectangles (so-called fitboxes) on one or more pages. The following steps are required for placing a Textflow on the page: > The function PDF_add_textflow( ) accepts portions of text and corresponding formatting options, creates a Textflow object, and returns a handle. As an alternative, the function PDF_create_textflow( ) analyzes the complete text in a single call, where the text may contain inline options for formatting control. These functions do not place any text on the page. > The function PDF_fit_textflow( ) places all or parts of the Textflow in the supplied fitbox. To completely place the text, this step must possibly be repeated several times where each of the function calls provides a new fitbox which may be located on the same or another page. > The function PDF_delete_textflow( ) deletes the Textflow object after it has been placed in the document. The functions PDF_add/create_textflow( ) for creating Textflows support a variety of options for controlling the formatting process. These options can be provided in the functions option list, or embedded as inline options in the text when using PDF_create_ textflow( ). PDF_info_textflow( ) can be used to query formatting results and many other Textflow details. We will discuss Textflow placement using some common application examples. A complete list of Textflow options can be found in the PDFlib Reference. Many of the options supported by PDF_add/create_textflow( ) are identical to those of PDF_fit_textline( ). It is therefore recommended to familiarize yourself with the examples in Section 8.1, Placing and Fitting Textlines, page 171. In the below sections we will focus on options related to multi-line text. Cookbook Code samples regarding text output issues can be found in the text_output category of the PDFlib Cookbook.
180
piece of normal text. Font, font size, and encoding are specified explicitly. In the first call to PDF_add_textflow( ), -1 is supplied, and the Textflow handle will be returned to be used in subsequent calls to PDF_add_textflow( ), if required. text1 and text2 are assumed to contain the actual text to be printed. With PDF_fit_textflow( ), the resulting Textflow is placed in a fitbox on the page using default formatting options.
/* Add text with bold font */ tf = p.add_textflow(-1, text1, "fontname=Helvetica-Bold fontsize=9 encoding=unicode"); if (tf == -1) throw new Exception("Error: " + p.get_errmsg()); /* Add text with normal font */ tf = p.add_textflow(tf, text2, "fontname=Helvetica fontsize=9 encoding=unicode"); if (tf == -1) throw new Exception("Error: " + p.get_errmsg()); /* Place all text */ result = p.fit_textflow(tf, left_x, left_y, right_x, right_y, ""); if (!result.equals("_stop")) { /* ... */} p.delete_textflow(tf);
Placing text in two fitboxes on multiple pages. If the text placed with PDF_fit_ textflow( ) doesnt completely fit into the fitbox, the output will be interrupted and the function will return the string _boxfull. PDFlib will remember the amount of text already placed, and will continue with the remainder of the text when the function is called again. In addition, it may be necessary to create a new page. The following code fragment demonstrates how to place a Textflow in two fitboxes per page on one or more pages until the text has been placed completely (see Figure 8.14). Cookbook A full code sample can be found in the Cookbook topic text_output/starter_textflow.
/* Loop until all of the text is placed; create new pages as long as more text needs * to be placed. Two columns will be created on all pages. */ fitbox 1 page 1
1 Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 2 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 3 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 4 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 5 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 6 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 7 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 8 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 9 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 10 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 11 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 12 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 13 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 14 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 15 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 16 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 17 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 18 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 19 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 20 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
fitbox 2
fitbox 3
fitbox 4 page 2
21 Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 22 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 23 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 24 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 25 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 26 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 27 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
181
do { String optlist = "verticalalign=justify linespreadlimit=120%"; p.begin_page_ext(0, 0, "width=a4.width height=a4.height"); /* Fill the first column */ result = p.fit_textflow(tf, llx1, lly1, urx1, ury1, optlist); /* Fill the second column if we have more text*/ if (!result.equals("_stop")) result = p.fit_textflow(tf, llx2, lly2, urx2, ury2, optlist); p.end_page_ext(""); /* "_boxfull" means we must continue because there is more text; * "_nextpage" is interpreted as "start new column" */ } while (result.equals("_boxfull") || result.equals("_nextpage")); /* Check for errors */ if (!result.equals("_stop")) { /* "_boxempty" happens if the box is very small and doesn't hold any text at all. */ if (result.equals( "_boxempty")) throw new Exception("Error: " + p.get_errmsg()); else { /* Any other return value is a user exit caused by the "return" option; * this requires dedicated code to deal with. */ } } p.delete_textflow(tf);
182
leftindent = 15 parindent = 2 0
leading = 140 %
Fig. 8.15 Placing a Textflow with options
Have a look at our new paper plane models! Our paper planes are the ideal way of passing the time. We offer revolutionary new developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet. They are exclusively folded without using any adhesive. Several models are equipped with a folded landing gear enabling a safe landing on the intended location provided that you have aimed well. Other models are able to fly loops or cover long distances. Let them start from a vista point in the mountains and see where they touch the ground.
rightindent = 1 0
alignment = justify
Inline option lists for PDF_create_textflow( ). Up to now we provided formatting options in an option list supplied directly to the function. In order to continue the same way we would have to split the text, and place it in two separate calls, one for the headline and another one for the remaining text. However, in certain situations, e.g. with lots of formatting changes, this method might be pretty cumbersome. For this reason, PDF_create_textflow( ) can be used instead of PDF_add_texflow( ). PDF_ create_textflow( ) interprets text and so-called inline options which are embedded directly in the text. Inline option lists are provided as part of the body text. By default, they are delimited by < and > characters. We will therefore integrate the options for formatting the heading and the remaining paragraphs into our body text as follows. Note Inline option lists are colorized in all subsequent samples; end-of-paragraph characters are visualized with arrows.
<leftindent=15 rightindent=10 alignment=center fontname=Helvetica fontsize=12 encoding=winansi>Have a look at our new paper plane models! <alignment=justify fontname=Helvetica leading=140% fontsize=8 encoding=winansi> Our paper planes are the ideal way of passing the time. We offer revolutionary new developments of the traditional common paper planes. <parindent=20>If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet. They are exclusively folded without using any adhesive. Several models are equipped with a folded landing gear enabling a safe
H1 Body
Body_indented
183
landing on the intended location provided that you have aimed well. Other models are able to fly loops or cover long distances. Let them start from a vista point in the mountains and see where they touch the ground.
The characters for bracketing option lists can be redefined with the begoptlistchar and endoptlistchar options. Supplying the keyword none for the begoptlistchar option completely disables the search for option lists. This is useful if the text doesnt contain any inline option lists, and you want to make sure that < and > will be processed as regular characters. Macros. The text above contains several different types of paragraphs, such as heading or body text with or without indentation. Each of these paragraph types is formatted differently and occurs multiply in longer Textflows. In order to avoid starting each paragraph with the corresponding inline options, we can combine these in macros, and refer to the macros in the text via their names. As shown in Figure 8.16 we define three macros called H1 for the heading, Body for main paragraphs, and Body_indented for indented paragraphs. In order to use a macro we place the & character in front of its name and put it into an option list. The following code fragment defines three macros according to the previously used inline options and uses them in the text:
<macro { H1 {leftindent=15 rightindent=10 alignment=center fontname=Helvetica fontsize=12 encoding=winansi} Body {leftindent=15 rightindent=10 alignment=justify leading=140% fontname=Helvetica fontsize=8 encoding=winansi} Body_indented {parindent=20 leftindent=15 rightindent=10 alignment=justify leading=140% fontname=Helvetica fontsize=8 encoding=winansi} }> <&H1>Have a look at our new paper plane models! <&Body>Our paper planes are the ideal way of passing the time. We offer revolutionary new developments of the traditional common paper planes. <&Body_indented>If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet. They are exclusively folded without using any adhesive. Several models are equipped with a folded landing gear enabling a safe landing on the intended location provided that you have aimed well. Other models are able to fly loops or cover long distances. Let them start from a vista point in the mountains and see where they touch the ground.
Explicitly setting options. Note that all options which are not set in macros will retain their previous values. In order to avoid side effects caused by unwanted inheritance of options you should explicitly specify all settings required for a particular macro. This way you can ensure that the macros will behave consistently regardless of their ordering or combination with other option lists. On the other hand, you can take advantage of this behavior for deliberately retaining certain settings from the context instead of supplying them explicitly. For example, a macro could specify the font name without supplying the fontsize option. As a result, the font size will always match that of the preceding text.
184
right 150
right 250
right 350
ITEM 1 2 3
QUANTITY 2 5 1
Inline options or options passed as function parameters? When using Textflows it makes an important difference whether the text is contained literally in the program code or comes from some external source, and whether the formatting instructions are separate from the text or part of it. In most applications the actual text will come from some external source such as a database. In practise there are two main scenarios: > Text contents from external source, formatting options in the program: An external source delivers small text fragments which are assembled within the program, and combined with formatting options (in the function call) at runtime. > Text contents and formatting options from external source: Large amounts of text including formatting options come from an external source. The formatting is provided by inline options in the text, represented as simple options or macros. When it comes to macros a distinction must be made between macro definition and macro call. This allows an interesting intermediate form: the text content comes from an external source and contains macro calls for formatting. However, the macro definitions are only blended in at runtime. This has the advantage that the formatting can easily be changed without having to modify the external text. For example, when generating greeting cards one could define different styles via macros to give the card a romantic, technical, or other touch.
To place that simple table use the following option list in PDF_add/create_textflow( ). The ruler option defines the tab positions, tabalignment specifies the alignment of tab stops, and hortabmethod specifies the method used to process tab stops (the result can be seen in Figure 8.17):
String optlist = "ruler ={30 150 250 350} " + "tabalignment={left right right right} " + "hortabmethod=ruler leading=120% fontname=Helvetica fontsize=9 encoding=winansi";
185
Cookbook A full code sample can be found in the Cookbook topic text_output/tabstops_in_textflow. Note PDFlibs table feature is recommended for creating complex tables (see Section 8.3, Table Formatting, page 199).
Cookbook Full code samples for bulleted and numbered lists can be found in the Cookbook topics text_output/bulleted_list and text_output/numbered_list. Setting and resetting the indentation value is cumbersome, especially since it is required for each paragraph. A more elegant solution defines a macro called list. For convenience it defines a macro indent which is used as a constant. The macro definitions are as follows:
<macro { indent {25} list {parindent=-&indent leftindent=&indent hortabsize=&indent hortabmethod=ruler ruler={&indent}} }> <&list>1. Long Distance Glider: With this paper rocket you can send all your messages even when sitting in a hall or in the cinema pretty near the back. 2. Giant Wing: An unbelievable sailplane! It is amazingly robust and can even do aerobatics. But it is best suited to gliding. 3. Cone Head Rocket: This paper arrow can be thrown with big swing. We launched it from the roof of a hotel. It stayed in the air a long time and covered a considerable distance.
The leftindent option specifies the distance from the left margin. The parindent option, which is set to the negative of leftindent, cancels the indentation for the first line of each paragraph. The options hortabsize, hortabmethod, and ruler specify a tab stop which corresponds to leftindent. It makes the text after the number to be indented with the amount specified in leftindent. Figure 8.19 shows the parindent and leftindent options at work.
Fig. 8.18 Numbered list
1. Long Distance Glider: With this paper rocket you can send all your messages even when sitting in a hall or in the cinema pretty near the back. 2. Giant Wing: An unbelievable sailplane! It is amazingly robust and can even do aerobatics. But it is best suited to gliding. 3. Cone Head Rocket: This paper arrow can be thrown with big swing. We launched it from the roof of a hotel. It stayed in the air a long time and covered a considerable distance.
186
3.
Long Distance Glider: With this paper rocket you can send all your messages even when sitting in a hall or in the cinema pretty near the back. Giant Wing: An unbelievable sailplane! It is amazingly robust and can even do aerobatics. But it is best suited to gliding. Cone Head Rocket: This paper arrow can be thrown with big swing. We launched it from the roof of a hotel. It stayed in the air a long time and covered a considerable distance.
Setting the distance between two paragraphs. In many cases more distance between adjacent paragraphs is desired than between the lines within a paragraph. This can be achieved by inserting an extra empty line (which can be created with the nextline option), and specifying a suitable leading value for this empty line. This value is the distance between the baseline of the last line of the previous paragraph and the baseline of the empty line. The following example will create 80% additional space between the two paragraphs (where 100% equals the most recently set value of the font size):
1. Long Distance Glider: With this paper rocket you can send all your messages even when sitting in a hall or in the cinema pretty near the back. <nextline leading=80%><nextparagraph leading=100%>2. Giant Wing: An unbelievable sailplane! It is amazingly robust and can even do aerobatics. But it is best suited to gliding.
Cookbook A full code sample can be found in the Cookbook topic text_output/distance_between_ paragraphs.
Replacing characters or sequences of characters. The charmapping option can be used to replace some characters in the text with others. Lets start with an easy case where we will replace all tabs in the text with space characters. The charmapping option to achieve this looks as follows:
charmapping={hortab space}
This command uses the symbolic character names hortab and space. You can find a list of all known character names in the PDFlib Reference. To achieve multiple mappings at
187
once you can use the following command which will replace all tabs and line break combinations with space characters:
charmapping={hortab space CRLF space LF space CR space}
Each arbitrary long sequence of linefeed characters will be reduced to a single linefeed character:
charmapping={linefeed {linefeed -1}}
We will take a closer look at the last example. Lets assume you receive text where the lines have been separated with fixed line breaks by some other software, and therefore cannot be properly formatted. You want to replace the linebreaks with space characters in order to achieve proper formatting within the fitbox. To achieve this we replace arbitrarily long sequences of linebreaks with a single space character. The initial text looks as follows:
To fold the famous rocket looper proceed as follows: Take a sheet of paper. Fold it lengthwise in the middle. Then, fold down the upper corners. Fold the long sides inwards that the points A and B meet on the central fold.
The following code fragment demonstrates how to replace the redundant linebreak characters and format the resulting text:
/* assemble option list */ String optlist = "fontname=Helvetica fontsize=9 encoding=winansi alignment=justify " + "charmapping {CRLF {space -1}}" /* place textflow in fitbox */ textflow = p.add_textflow(-1, text, optlist); if (textflow == -1) throw new Exception("Error: " + p.get_errmsg()); result = p.fit_textflow(textflow, left_x, left_y, right_x, right_y, ""); if (!result.equals("_stop")) { /* ... */ } p.delete_textflow(textflow);
188
To fold the famous rocket looper proceed as follows: Take a sheet of paper. Fold it lengthwise in the middle. Then, fold down the upper corners. Fold the long sides inwards that the points A and B meet on the central fold. To fold the famous rocket looper proceed as follows: Take a sheet of paper. Fold it lengthwise in the middle. Then, fold down the upper corners. Fold the long sides inwards that the points A and B meet on the central fold.
Fig. 8.20 Top: text with redundant line breaks Bottom: replacing the linebreaks with the charmapping option
Figure 8.20 shows Textflow output with the unmodified text and the improved version with the charmapping option. Symbol fonts in Textflows and the textlen option. Symbol fonts, more precisely: text in a font which is not Unicode-compatible according to Section 4.3.4, Unicode-compatible Fonts, page 90, deserves some special attention when used within Textflows: > The control characters will not be treated specially, i.e. they have no special meaning. > Some Textflow options will be ignored since they do not make sense for symbol fonts, e.g. tabalignchar. > Since inline option lists cannot be used in text portions with symbol fonts (since the symbols dont have any intrinsic meaning it would be impossible to locate and interpret option lists), the length of text fragments consisting of symbol characters must explicitly be specified using the textlen option. > After textlen characters a new inline option list must be placed in the text. Usually the next option list will switch to another font/encoding combination, but this is not required. Omitting the textlen option for Symbol fragments, or failing to supply another inline option list immediately after the Symbol fragment will result in an exception. The following fragment contains a Greek character from the Symbol font inserted between Latin characters:
<fontname=Helvetica fontsize=12 encoding=winansi>The Greek letter <fontname=Symbol encoding=builtin textlen=1>A<fontname=Helvetica encoding=winansi> symbolizes beginning.
Using characters with codes greater than 127 (0x7F) can get a bit tricky subject to the syntax requirements of the programming language in use. The following examples create a right arrow from the ZapfDingbats font. This character has glyph name a161 and code 0xD5 which corresponds to the character in winansi. The following example uses PDFlibs escape sequence syntax \xD5. If used directly in a C language program, the backslash must be preceded by another backslash. Processing escape sequences must be enabled with the escapesequence option. The length of the fragment (after \\ processing) is 4 bytes:
<escapesequence fontname=ZapfDingbats encoding=builtin textlen=4>\\xD5<fontname=Helvetica encoding=winansi>
The following example uses the \u syntax of Java and other languages. The length of the text fragment (after \u expansion) is 1 Unicode character:
189
The following example uses a literal character, assuming the source code is compiled in the winansi/cp1252 codepage (e.g. javac -encoding 1252). Again, the length of the text fragment is 1:
<fontname=ZapfDingbats encoding=builtin textlen=1><fontname=Helvetica encoding=winansi>
Instead of numerically addressing the character we can refer to its glyph name, using PDFlibs glyph name reference syntax (see Section 4.5.2, Character References and Glyph Name References, page 96) which requires unicode encoding. Glyph name processing must be enabled with the charref option. The length of the text fragment is 7 characters since the complete contents of the glyph name reference are counted. In Unicode-aware language bindings the following example will do the trick:
<charref fontname=ZapfDingbats encoding=unicode textlen=7>&.a161;<fontname=Helvetica encoding=winansi>
In non-Unicode-aware language bindings we must set the text format to bytes since otherwise two bytes per character would be required for unicode encoding:
<charref fontname=ZapfDingbats encoding=unicode textformat=bytes textlen=7>&.a161; <fontname=Helvetica encoding=winansi>
8.2.7 Hyphenation
PDFlib does not automatically hyphenate text, but can break words at hyphenation opportunities which are explicitly marked in the text by soft hyphen characters. The soft hyphen character is at position U+00AD in Unicode, but several methods are available for specifying the soft hyphen in non-Unicode environments: > In all cp1250 cp1258 (including winansi) and iso8859-1 iso8859-16 encodings the soft hyphen is at decimal 173, octal 255, or hexadecimal 0xAD. > In ebcdic encoding the soft hyphen is at decimal 202, octal 312, or hexadecimal 0xCA. > A character entity reference can be used if an encoding does not contain the soft hyphen character (e.g. macroman): ­ U+002D will be used as hyphenation character. In addition to breaking opportunities designated by soft hyphens, words can be forcefully hyphenated in extreme cases when other methods of adjustment, such as changing the word spacing or shrinking text, are not possible. Justified text with or without hyphen characters. In the following example we will print the following text with justified alignment. The text contains soft hyphen characters (visualized here as dashes):
190
Our paper planes are the ideal way of passing the time. We offer revolutionary brand new developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet. They are exclusively folded without using any adhesive. Several models are equipped with a folded landing gear enabling a safe landing on the intended location provided that you have aimed well. Other models are able to fly loops or cover long distances. Let them start from a vista point in the mountains and see where they touch the ground.
Our paper planes are the ideal way of passing the time. We offer revolutionary brand new developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet. They are exclusively folded without using any adhesive. Several models are equipped with a folded landing gear enabling a safe landing on the intended location provided that you have aimed well. Other models are able to fly loops or cover long distances. Let them start from a vista point in the mountains and see where they touch the ground.
Fig. 8.21 Justified text with soft hyphen characters, using default settings and a wide fitbox
Fig. 8.22 Justified text without soft hyphens, using default settings and a wide fitbox.
Our paper planes are the ideal way of pas sing the time. We offer revolu tionary brand new dev elop ments of the tradi tional common paper planes. If your lesson, confe rence, or lecture turn out to be deadly boring, you can have a wonder ful time with our planes. All our models are folded from one paper sheet. They are exclu sively folded without using any adhe sive. Several models are equip ped with a folded landing gear enab ling a safe landing on the intended loca tion provided that you have aimed well. Other models are able to fly loops or cover long dist ances. Let them start from a vista point in the mount ains and see where they touch the ground.
Figure 8.21 shows the generated text output with default settings for justified text. It looks perfect since the conditions are optimal: the fitbox is wide enough, and there are explicit break opportunities specified by the soft hyphen characters. As you can see in Figure 8.22, the output looks okay even without explicit soft hyphens. The option list in both cases looks as follows:
fontname=Helvetica fontsize=9 encoding=winansi alignment=justify
191
Table 8.1 Options for controlling the line-breaking algorithm option adjustmethod explanation (Keyword) The method used to adjust a line when a text portion doesnt fit into a line after compressing or expanding the distance between words subject to the limits specified by the minspacing and maxspacing options. Default: auto auto clip nofit The following methods are applied in order: shrink, spread, nofit, split. Same as nofit (see below), except that the long part at the right edge of the fitbox (taking into account the rightindent option) will be clipped. The last word will be moved to the next line provided the remaining (short) line will not be shorter than the percentage specified in the nofitlimit option. Even justified paragraphs will look slightly ragged in this case. If a word doesnt fit in the line the text will be compressed subject to the shrinklimit option until the word fits. If it still doesnt fit the nofit method will be applied. The last word will not be moved to the next line, but will forcefully be hyphenated. For text fonts a hyphen character will be inserted, but not for symbol fonts. The last word will be moved to the next line and the remaining (short) line will be justified by increasing the distance between characters in a word, subject to the spreadlimit option. If justification still cannot be achieved the nofit method will be applied.
avoidbreak charclass
(Boolean) If true, avoid any line breaks until avoidbreak is reset to false. Default: false (List of pairs, where the first element in each pair is a keyword, and the second element is either a unichar or a list of unichars) The specified unichars will be classified by the specified keyword to determine the line breaking behaviour of those character(s): letter punct open close default behave like a letter (e.g. a B) behave like a punctuation character (e.g. + / ; : ) behave like an open parenthesis (e.g. [ ) behave like a close parenthesis (e.g. ] ) reset all character classes to PDFlibs builtin defaults open letter {/ : =} punct & }
(Unichar or keyword) Unicode value of the character which replaces a soft hyphen at line breaks. The value 0 and the keyword none completely suppress hyphens. Default: U+00AD (SOFT HYPHEN) if available in the font, U+002D (HYPHEN-MINUS) otherwise (Float or percentage) Specifies the maximum or minimum distance between words (in user coordinates, or as a percentage of the width of the space character). The calculated word spacing is limited by the provided values (but the wordspacing option will still be added). Defaults: minspacing=50%, maxspacing=500% (Float or percentage) Lower limit for the length of a line with the nofit method (in user coordinates or as a percentage of the width of the fitbox). Default: 75%. (Percentage) Lower limit for compressing text with the shrink method; the calculated shrinking factor is limited by the provided value, but will be multiplied with the value of the horizscaling option. Default: 85% (Float or percentage) Upper limit for the distance between two characters for the spread method (in user coordinates or as a percentage of the font size); the calculated character distance will be added to the value of the charspacing option. Default: 0
maxspacing minspacing
nofitlimit shrinklimit
spreadlimit
characters there is a line break opportunity at the beginning of the option list. If a line break occurs at the option list and alignment=justify, the spaces preceding the option list will be discarded. The spaces after the option list will be retained, and will appear at the beginning of the next line.
192
Our paper planes are the ideal way of passing the time.We offer revolutionary brand new developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All
Fig. 8.23 Justified text in a narrow fitbox with default settings decrease the distance between words (minspacing option) compress the line (shrink method, shrinklimit option) force hyphenation (split method)
Preventing linebreaks. You can use the charclass option to prevent Textflow from breaking a line after specific characters. For example, the following option will prevent line breaks immediately after the / character:
charclass={letter /}
In order to prevent a sequence of text from being broken across lines you can bracket it with avoidbreak...noavoidbreak. Cookbook A full code sample can be found in the Cookbook topic text_output/avoid_linebreaking. Advanced script-specific linebreaking. The advancedlinebreaking and locale options can be used to enable an advanced linebreaking algorithm which takes into account language-specific linebreaking and script-specific rules; see Section 6.3.5, Advanced Line Breaking, page 138, for more information. Formatting CJK text. The textflow engine is prepared to deal with CJK text, and properly treats CJK characters as ideographic glyphs as per the Unicode standard. As a result, CJK text will never be hyphenated. For improved formatting the following options are recommended when using Textflow with CJK text; they will disable hyphenation for inserted Latin text and create evenly spaced text output:
hyphenchar=none alignment=justify shrinklimit=100% spreadlimit=100%
Vertical writing mode is not supported in Textflow. Justified text in a narrow fitbox. The narrower the fitbox, the more important are the options for controlling justified text. Figure 8.23 demonstrates the results of the various methods for justifying text in a narrow fitbox. The option settings in Figure 8.23 are basically okay, with the exception of maxspacing which provides a rather large distance between words. However, it is recommended to keep this for narrow fitboxes since otherwise the ugly forced hyphenation caused by the split method will occur more often. If the fitbox is so narrow that occasionally forced hyphenations occur, you should consider inserting soft hyphens, or modify the options which control justified text.
193
Option shrinklimit for justified text. The most visually pleasing solution is to reduce the shrinklimit option which specifies a lower limit for the shrinking factor applied by the shrink method. Figure 8.24a shows how to avoid forced hyphenation by compressing text down to shrinklimit=50%.
Fig. 8.24 Options for justified text in a narrow fitbox Generated output Option list for PDF_fit_textflow( )
a)
passing the time.We offer revolutionary brand new developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to
Our paper planes are the ideal way of passing the time.We offer revolutionary b r a n d n e w developments of the ments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have
alignment=justify shrinklimit=50%
b)
alignment=justify spreadlimit=5
c)
alignment=justify nofitlimit=50
Option spreadlimit for justified text. Expanding text, which is achieved by the spread method and controlled by the spreadlimit option, is another method for controlling line breaks. This unpleasing method should be rarely used, however. Figure 8.24b demonstrates a very large maximum character distance of 5 units using spreadlimit=5. Option nofitlimit for justified text. The nofitlimit option controls how small a line can get when the nofit method is applied. Reducing the default value of 75% is preferable to forced hyphenation when the fitbox is very narrow. Figure 8.24c shows the generated text output with a minimum text width of 50%.
194
The Textflow is added. Then we place it using the wrap option with the images matchbox img as the area to run around as follows (see Figure 8.25):
result = p.fit_textflow(textflow, left_x, left_y, right_x, right_y, "wrap={usematchboxes={{img}}}");
Before placing the text you can fit more images using the same matchbox name. In this case the text will run around all images. Cookbook A full code sample can be found in the Cookbook topic text_output/wrap_text_around_images. Wrapping text around arbitrary path. You can create a path object (see Section 3.2.3, Direct Paths and Path Objects, page 62) and use it as a wrap shape. The following fragment constructs a path object with a simple shape (a circle) and supplies it to the wrap option of PDF_fit_textflow( ). The reference point for placing the path is expressed as percentage of the fitboxs width and height:
path = p.add_path_point( -1, 0, 100, "move", ""); path = p.add_path_point(path, 200, 100, "control", ""); path = p.add_path_point(path, 0, 100, "circular", ""); /* Visualize the path if desired */ p.draw_path(path, x, y, "stroke"); result = p.fit_textflow(tf, llx1, lly1, urx1, ury1, "wrap={paths={" + "{path=" + path + " refpoint={100% 50%} }" + "}}"); p.delete_path(path);
Use the inversefill option to wrap text inside the path instead of wrapping the text around the path (i.e. the path serves as text container instead of creating a hole in the Textflow):
result = p.fit_textflow(tf, llx1, lly1, urx1, ury1, "wrap={inversefill paths={" + "{path=" + path + " refpoint={100% 50%} }" + "}}");
Wrapping text around an image clipping path. TIFF and JPEG images can contain an integrated clipping path. The path must have been created in an image processing application and will be evaluated by PDFlib. If a default clipping path is found in the image
Fig. 8.25 Wrapping text around an image with matchbox
Have a look at our new paper plane models! Our paper planes are the ideal way of passing the time. revolutionary new We offer developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet. They are exclusively folded without using any adhesive.
195
it will be used, but you can specify any other clipping path in the image with the clippingpathname option of PDF_load_image( ). If the image has been loaded with a clipping path you can extract the path and supply it to the wrap option PDF_fit_textflow( ) as above. We also supply the scale option to enlarge the imported image clipping path:
image = p.load_image("auto", "image.tif", "clippingpathname={path 1}"); /* Create a path object from the images clipping path */ path = (int) p.info_image(image, "clippingpath", ""); if (path == -1) throw new Exception("Error: clipping path not found!"); result = p.fit_textflow(tf, llx1, lly1, urx1, ury1, "wrap={paths={{path=" + path + " refpoint={50% 50%} scale=2}}}"); p.delete_path(path);
Placing an image and wrapping text around it. While the previous section used only the clipping path of an image (but not the image itself), lets now place the image inside the fitbox of the Textflow and wrap the text around it. In order to achieve this we must again load the image with the clippingpathname option and place it on the page with PDF_fit_image( ). In order to create the proper path object for wrapping the Textflow we call PDF_info_image( ) with the same option list as PDF_fit_image( ). Finally, the reference point (the x/y parameters of PDF_fit_image( )) must be supplied to the refpoint suboption of the paths suboption of the wrap option:
image = p.load_image("auto", "image.tif", "clippingpathname={path 1}"); /* Place the image on the page with some fitting options */ String imageoptlist = "scale=2"; p.fit_image(image, x, y, imageoptlist); /* Create a path object from the images clipping path, using the same option list */ path = (int) p.info_image(image, "clippingpath", imageoptlist); if (path == -1) throw new Exception("Error: clipping path not found!"); result = p.fit_textflow(tf, llx1, lly1, urx1, ury1, "wrap={paths={{path=" + path + " refpoint={" + x + " " + y + "} }}}"); p.delete_path(path);
You can supply the same wrap option in multiple calls to PDF_fit_textflow( ). This is useful if the placed image overlaps multiple Textflow fitboxes, e.g. for multi-column text. Wrapping text around non-rectangular shapes. As an alternative to creating a path object as wrap shape you can specify path elements directly in Textflow options. In addition to wrapping text around a rectangle specified by a matchbox you can define arbitrary graphical elements as wrapping shapes. For example, the following option list will wrap the text around a triangular shape (see Figure 8.26):
wrap={ polygons={ {50% 80% 20% 30% 80% 30% 50% 80%} } }
196
10% 50%
90% 50%
Our paper planes are the ideal way of passing the time. We offer revolutionary new developments of the traditional common paper planes. If your lesson, conference, or lecture turn out to be deadly boring, you can have a wonderful time with our planes. All our models are folded from one paper sheet.
50% 0%
Note that the showborder=true option has been used to illustrate the margins of the shapes. The wrap option can contain multiple shapes. The following option list will wrap the text around two triangle shapes:
wrap={ polygons={ {50% 80% {20% 90% 20% 30% 10% 70% 80% 30% 30% 70% 50% 80%} 20% 90%} } }
Instead of percentages (relative coordinates within the fitbox) absolute coordinates on the page can be used. Note It is recommended to set fixedleading=true when using shapes with segments which are neither horizontally nor vertically oriented. Cookbook A full code sample can be found in the Cookbook topic text_output/wrap_text_around_ polygons. Filling non-rectangular shapes. The wrap feature can also be used to place the contents of a Textflow in arbitrarily shaped areas. This is achieved with the addfitbox suboption of the wrap option. Instead of wrapping the text around the specified shapes the text will be placed within one or more shapes. The following option list can be used to flow text into a rhombus shape, where the coordinates are provided as percentages of the fitbox rectangle (see Figure 8.27):
wrap={ addfitbox polygons={ {50% 100% 10% 50% 50% 0% 90% 50% 50% 100%} } }
Note that the showborder=true option has been again used to illustrate the margins of the shape. Without the addfitbox option the rhombus shape will remain empty and the text will be wrapped around it. Filling overlapping shapes. In the next example we will fill a shape comprising two overlapping polygons, namely a hexagon with a rectangle inside. Using the addfitbox option the fitbox itself will be excluded from being filled, and the polygons in the subsequent list will be filled except in the overlapping area (see Figure 8.28):
wrap={ addfitbox polygons= { {20% 10% 80% 10% {35% 35% 65% 35% 100% 50% 80% 90% 65% 65% 35% 65% 20% 90% 0% 50% 35% 35%} } } 20% 10%}
197
Without the addfitbox option you will get the opposite effect: the previously filled area will remain empty, and the previously empty areas will be filled with text. Cookbook A full code sample can be found in the Cookbook topic text_output/fill_polygons_with_text .
198
Cell spanning three columns Cell containing image and text line
Simple cell
2 Long Distance Glider Material Drawing paper 180g/sqm With this paper rocket you can send all your messages even when sitting in the cinema pretty near the back.
Benefit
3 Cone Head Rocket Material Benefit Kent paper 200g/sqm This paper arrow can be thrown with big swing. It stays in the air a long time.
Footer
As an example, all aspects of creating the table in Figure 8.29 will be explained. A complete description of the table formatting options can be found in the PDFlib Reference. Creating a table starts by defining the contents and visual properties of each table cell with PDF_add_table_cell( ). Then you place the table using one or more calls to PDF_fit_ table( ).
199
When placing the table the size of its fitbox and the ruling and shading of table rows or columns can be specified. Use the Matchbox feature for details such as cell-specific shading (see Section 8.4, Matchboxes, page 214, for more information). In this section the most important options for defining the table cells and fitting the table will be discussed. All examples demonstrate the relevant calls of PDF_add_table_ cell( ) and PDF_fit_table( ) only, assuming that the required font has already been loaded. Note Table processing is independent from the current graphics state. Table cells can be defined in document scope while the actual table placement must be done in page scope. Cookbook A full code sample can be found in the Cookbook topic tables/starter_table.
200
tbl = p.add_table_cell(tbl, 1, 1, "Our Paper Planes", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); /* Add a text line cell in column 1 row 2 */ tbl = p.add_table_cell(tbl, 1, 2, "Material", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); /* Add a text line cell in column 1 row 3 */ tbl = p.add_table_cell(tbl, 1, 3, "Benefit", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); /* Define the option list for a text line placed in the second column */ optlist = "fittextline={position={left center} font=" + font + " fontsize=8} " + "colwidth=" + c2 + " margin=4"; /* Add a text line cell in column 2 row 2 */ tbl = p.add_table_cell(tbl, 2, 2, "Offset print paper 220g/sqm", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); /* Add a Textflow */ optlist = "font=" + font + " fontsize=8 leading=110%"; tf = p.add_textflow(-1, tf_text, optlist); /* Define the option list for the Textflow cell using the handle retrieved above */ optlist = "textflow=" + tf + " margin=4 colwidth=" + c2"; /* Add the Textflow table cell in column 2 row 3 */ tbl = p.add_table_cell(tbl, 2, 3, "", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); p.begin_page_ext(0, 0, "width=200 height=100"); /* Define the option list for fitting the table with table frame and cell ruling */ optlist = "stroke={{line=frame linewidth=0.8} {line=other linewidth=0.3}}"; /* Place the table instance */ result = p.fit_table(tbl, llx, lly, urx, ury, optlist); /* Check the result; "_stop" means all is ok. */ if (!result.equals("_stop")) { if (result.equals( "_error")) throw new Exception("Error: " + p.get_errmsg()); else { /* Any other return value requires dedicated code to deal with */ } } p.end_page_ext(""); /* This will also delete Textflow handles used in the table */ p.delete_table(tbl, "");
201
Fine-tuning the vertical alignment of cell contents. When we vertically center contents of various types in the table cells, they will be positioned with varying distance from the borders. In Figure 8.30a, the four text line cells have been placed with the following option list:
optlist = "fittextline={position={left center} font=" + font + " fontsize=8} colwidth=80 margin=4";
The Textflow cell is added without any special options. Since we vertically centered the text lines, the Benefit line will move down with the height of the Textflow.
Fig. 8.30 Aligning text lines and Textflow in table cells Generated output
Offset print paper 220g/sqm It is amazingly robust and can even do aerobatics. But it is best suited to gliding.
Benefit
Offset print paper 220g/sqm It is amazingly robust and can even do aerobatics. But it is best suited to gliding.
Benefit
As shown in Figure 8.30b, we want all cell contents to have the same vertical distance from the cell borders regardless of whether they are Textflows or text lines. To accomplish this we first prepare the option list for the text lines. We define a fixed row height of 14 points, and the position of the text line to be on the top left with a margin of 4 points. The fontsize=8 option which we supplied before doesnt exactly represent the letter height but adds some space below and above. However, the height of an uppercase letter is exactly represented by the capheight value of the font. For this reason we use fontsize={capheight=6} which will approximately result in a font size of 8 points and (along with margin=4), will sum up to an overall height of 14 points corresponding to the rowheight option. The complete option list of PDF_add_table_cell( ) for our text line cells looks as follows:
/* option list for the text line cells */ optlist = "fittextline={position={left top} font=" + font + " fontsize={capheight=6}} rowheight=14 colwidth=80 margin=4";
To add the Textflow we use fontsize={capheight=6} which will approximately result in a font size of 8 points and (along with margin=4), will sum up to an overall height of 14 points as for the text lines above.
/* option list for adding the Textflow */ optlist = "font=" + font + " fontsize={capheight=6} leading=110%";
202
In addition, we want the baseline of the Benefit text aligned with the first line of the Textflow. At the same time, the Benefit text should have the same distance from the top cell border as the Material text. To avoid any space from the top we add the Textflow cell using fittextflow={firstlinedist=capheight}. Then we add a margin of 4 points, the same as for the text lines:
/* option list for adding the Textflow cell */ optlist = "textflow=" + tf + " fittextflow={firstlinedist=capheight} " "colwidth=120 margin=4";
Cookbook A full code sample can be found in the Cookbook topic tables/vertical_text_alignment.
Text line
Text line Text line Text line Text line Textflow ................................ .............................................. ..............................................
Text line
Fig. 8.31 Contents of the table cells
Single-line text with Textlines. The text is supplied in the text parameter of PDF_add_ table_cell( ). In the fittextline option all formatting options of PDF_fit_textline( ) can be specified. The default fit method is fitmethod=nofit. The cell will be enlarged if the text doesnt completely fit into the cell. To avoid this, use fitmethod=auto to shrink the text subject to the shrinklimit option. If no row height is given it will be calculated as the font size times 1.5. The same applies to the row width for rotated text. Multi-line text with Textflow. The Textflow must have been prepared outside the table functions and created with PDF_create_textflow( ) or PDF_add_textflow( ) before calling PDF_add_table_cell( ). The Textflow handle is supplied in the textflow option. In the fittextflow option all formatting options of PDF_fit_textflow( ) can be specified. The default fit method is fitmethod=clip. This means: First it is attempted to completely fit the text into the cell. If the cell is not large enough its height will be increased. If the text do not fit anyway it will be clipped at the bottom. To avoid this, use fitmethod=auto to shrink the text subject to the minfontsize option. When the cell is too narrow the Textflow could be forced to split single words at undesired positions. If the checkwordsplitting option is true the cell width will be enlarged until no word splitting occurs any more. Images and templates. Images must be loaded with PDF_load_image( ) before calling PDF_add_table_cell( ). Templates must be created with PDF_begin_template_ext( ). The im-
203
age or template handle is supplied in the image option. In the fitimage option all formatting options of PDF_fit_image( ) can be specified. The default fit method is fitmethod=meet. This means that the image/template will be placed completely inside the cell without distorting its aspect ratio. The cell size will not be changed due to the size of the image/template. Pages from an imported PDF document. The PDI page must have been opened with PDF_open_pdi_page( ) before calling PDF_add_table_cell( ). The PDI page handle is supplied in the pdipage option. In the fitpdipage option all formatting options of PDF_fit_ pdi_page( ) can be specified. The default fit method is fitmethod=meet. This means that the PDI page will be placed completely inside the cell without distorting its aspect ratio. The cell size will not be changed due to the size of the PDI page. Path objects. Path objects must have been created with PDF_add_path_point( ) before calling PDF_add_table_cell( ). The path handle is supplied in the path option. In the fitpath option all formatting options of PDF_draw_path( ) can be specified. The bounding box of the path will be placed in the table cell. The lower left corner of the inner cell box will be used as reference point for placing the path. Annotations. Annotations in table cells can be created with the annotationtype option of PDF_add_table_cell( ) which corresponds to the type parameter of PDF_create_ annotation( ) (but this function does not have to be called). In the fitannotation option all options of PDF_create_annotation( ) can be specified. The cell box will be used as annotation rectangle. Form fields. Form fields in table cells can be created with the fieldname and fieldtype options of PDF_add_table_cell( ) which correspond to the name and type parameters of PDF_create_field( ) (but this function does not have to be called). In the fitfield option all options of PDF_create_field( ) can be specified. The cell box will be used as field rectangle. Positioning cell contents in the inner cell box. By default, cell contents are positioned with respect to the cell box. The margin options of PDF_add_table_cell( ) can be used to specify some distance from the cell borders. The resulting rectangle is called the inner cell box. If any margin is defined the cell contents will be placed with respect to the inner cell box (see Figure 8.32). If no margins are defined, the inner cell box is identical to the cell box. In addition, cell contents may be subject to further options supplied in the content-specific fit options, as described in section Section 8.3.4, Mixed Table Contents, page 206.
top margin
left margin
right margin
204
row 1
1 2
row 2
1 3
Fig. 8.33 Simple cells and cells spanning several rows or columns
row 3
1 4
simple cell
2 4
row 4
simple cell
column 1
column 2
column 3
Furthermore you can explicitly supply the width of the first column spanned by the cell with the colwidth option. By supplying each cell with a defined first column width all those width values will implicitly add up to the total table width. Figure 8.34 shows an example.
1 1
3 2
2 4
50
100
total table width of 240
90
Alternatively, you can specify the column widths as percentages if appropriate. In this case the percentages refer to the width of the tables fitbox. Either none or all column widths must be supplied as percentages. If some columns are combined to a column scaling group with the colscalegroup option of PDF_add_table_cell( ), their widths will be adjusted to the widest column in the group (see Figure 8.35),
column scaling group Max. Load
Range
Weight
12g 5g 7g
18m 30m 7m
If absolute coordinates are used (as opposed to percentages) and there are cells left without any column width defined, the missing widths are calculated as follows: First,
Speed
Fig. 8.35 The last four cells in the first row are in the same column scaling group. They will have the same widths.
205
for each cell containing a text line the actual width is calculated based on the column width or the text width (or the text height in case of rotated text). Then, the remaining table width is evenly distributed among the column widths which are still missing.
Step 1: Adding the first cell. We start with the first cell of our table. The cell will be placed in the first column of the first row and will span three columns. The first column has a width of 50 points. The text line is centered vertically and horizontally, with a margin of 4 points from all borders. The following code fragment shows how to add the first cell:
optlist = "fittextline={font=" + boldfont + " fontsize=12 position=center} " + "margin=4 colspan=3 colwidth=" + c1; tbl = p.add_table_cell(tbl, 1, 1, "Our Paper Plane Models", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg());
Step 2: Adding one cell spanning two columns. In the next step we add the cell containing the text line 1 Giant Wing. It will be placed in the first column of the second row and spans two columns. The first column has a width of 50 points. The row height is 14 points. The text line is positioned on the top left, with a margin of 4 points from all borders. We use fontsize={capheight=6} to get a unique vertical text alignment as described in Fine-tuning the vertical alignment of cell contents, page 202. Since the Giant Wing heading cell doesnt cover a complete row but only two of three columns it cannot be filled with color using on of the row-based shading options. We apply the Matchbox feature instead to fill the rectangle covered by the cell with a gray background color. (The Matchbox feature is discussed in detail in Section 8.4, Matchboxes, page 214.) The following code fragment demonstrates how to add the Giant Wing heading cell:
optlist = "fittextline={position={left top} font=" + boldfont + " fontsize={capheight=6}} rowheight=14 colwidth=" + c1 + " margin=4 colspan=2 matchbox={fillcolor={gray .92}}"; tbl = p.add_table_cell(tbl, 1, 2, "1 Giant Wing", optlist);
206
if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); Fig. 8.36 Adding table cells with various contents step by step Generated table Generation steps
Step 1: Add a cell spanning 3 columns Step 2: Add a cell spanning 2 columns Step 3: Add 3 more text line cells Step 4: Add the Textflow cell
Amazingly robust!
Step 5: Add the image cell with a text line Step 6: Fitting the table
Step 3: Add three more Textline cells. The following code fragment adds the Material, Benefit and Offset print paper... cells. The Offset print paper... cell will start in the second column defining a column width of 120 points. The cell contents is positioned on the top left, with a margin of 4 points from all borders.
optlist = "fittextline={position={left top} font=" + normalfont + " fontsize={capheight=6}} rowheight=14 colwidth=" + c1 + " margin=4"; tbl = p.add_table_cell(tbl, 1, 3, "Material", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); tbl = p.add_table_cell(tbl, 1, 4, "Benefit", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); optlist = "fittextline={position={left top} font=" + normalfont + " fontsize={capheight=6}} rowheight=14 colwidth=" + c2 + " margin=4"; tbl = p.add_table_cell(tbl, 2, 3, "Offset print paper 220g/sqm", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg());
Step 4: Add the Textflow cell. The following code fragment adds the It is amazingly... Textflow cell. To add a table cell containing a Textflow we first add the Textflow. We use fontsize={capheight=6} which will approximately result in a font size of 8 points and (along with margin=4), will sum up to an overall height of 14 points as for the text lines above.
tftext = "It is amazingly robust and can even do aerobatics. " + "But it is best suited to gliding."; optlist = "font=" + normalfont + " fontsize={capheight=6} leading=110%"; tf = p.add_textflow(-1, tftext, optlist); if (tf == -1) throw new Exception("Error: " + p.get_errmsg());
The retrieved Textflow handle will be used when adding the table cell. The first line of the Textflow should be aligned with the baseline of the Benefit text line. At the same
207
time, the Benefit text should have the same distance from the top cell border as the Material text. Add the Textflow cell using fittextflow={firstlinedist=capheight} to avoid any space from the top. Then add a margin of 4 points, the same as for the text lines.
optlist = "textflow=" + tf + " fittextflow={firstlinedist=capheight} " + "colwidth=" + c2 + " margin=4"; tbl = p.add_table_cell(tbl, 2, 4, "", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg());
Step 5: Add the image cell with a text line. In the fifth step we add a cell containing an image of the Giant Wing paper plane as well as the Amazingly robust! text line. The cell will start in the third column of the second row and spans three rows. The column width is 90 points. The cell margins are set to 4 points. For a first variant we place a TIFF image in the cell:
image = p.load_image("auto", "kraxi_logo.tif", ""); if (image == -1) throw new Exception("Error: " + p.get_errmsg()); optlist = "fittextline={font=" + boldfont + " fontsize=9} image=" + image + " colwidth=" + c3 + " rowspan=3 margin=4"; tbl = p.add_table_cell(tbl, 3, 2, "Amazingly robust!", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg());
Alternatively, you could import the image as a PDF page. Make sure that the PDI page is closed only after the call to PDF_fit_table( ).
int doc = p.open_pdi("kraxi_logo.pdf", "", 0); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); page = p.open_pdi_page(doc, pageno, ""); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg()); optlist = "fittextline={font=" + boldfont + " fontsize=9} pdipage=" + page + " colwidth=" + c3 + " rowspan=3 margin=4"; tbl = p.add_table_cell(tbl, 3, 2, "Amazingly robust!", optlist); if (tbl == -1) throw new Exception("Error: " + p.get_errmsg());
Step 6: Fit the table. In the last step we place the table with PDF_fit_table( ). Using header=1 the table header will include the first line. The fill option and the suboptions area=header and fillcolor={rgb 0.8 0.8 0.87} specify the header row(s) to be filled with the supplied color. Using the stroke option and the suboptions line=frame linewidth=0.8 we define a ruling of the table frame with a line width of 0.8. Using line=other linewidth=0.3 a ruling of all cells is specified with a line width of 0.3.
optlist = "header=1 fill={{area=header fillcolor={rgb 0.8 0.8 0.87}}} " + "stroke={{line=frame linewidth=0.8} {line=other linewidth=0.3}}";
208
result = p.fit_table(tbl, llx, lly, urx, ury, optlist); if (result.equals("_error")) throw new Exception("Error: " + p.get_errmsg()); p.end_page_ext("");
Material
thrown with big swing. It 2 Long Distance Glider stays in the air a long time. Drawing paper 180g/sqm
Benefit Our Paper Plane Models With this paper rocket you can send all your messages even when sitting in the Material Offset print paper pretty near the back. cinema 220g/sqm Amazingly robust! Benefit It is amazingly robust and can even do aerobatics. But it is best suited to gliding. 1 Giant Wing
Page 3
footer
Page 2
tables fitbox
Page 1
The following code fragment shows the general loop for fitting table instances until the table has been placed completely. New pages are created as long as more table instances need to be placed.
do { /* Create a new page */ p.begin_page_ext(0, 0, "width=a4.width height=a4.height"); /* Use the first row as header and draw lines for all table cells */ optlist = "header=1 stroke={{line=other}}"; /* Place the table instance */ result = p.fit_table(tbl, llx, lly, urx, ury, optlist); if (result.equals("_error")) throw new Exception("Error: " + p.get_errmsg()); p.end_page_ext("");
209
} while (result.equals("_boxfull")); /* Check the result; "_stop" means all is ok. */ if (!result.equals("_stop")) { if (result.equals( "_error")) throw new Exception("Error: " + p.get_errmsg()); else { /* Any other return value is a user exit caused by the "return" option; * this requires dedicated code to deal with. */ throw new Exception ("User return found in Textflow"); } } /* This will also delete Textflow handles used in the table */ p.delete_table(tbl, "");
Headers and footers. With the header and footer options of PDF_fit_table( ) you can define the number of initial or final table rows which will be placed at the top or bottom of a table instance. Using the fill option with area=header or area=footer, headers and footers can be individually filled with color. Header rows consist of the first n rows of the table definition and footer rows of the last m rows. Headers and footers are specified per table instance in PDF_fit_table( ). Consequently, they can differ among table instances: while some table instances include headers/footers, others can omit them, e.g. to specify a special row in the last table instance. Joining rows. In order to ensure that a set of rows will be kept together in the same table instance, they can be assigned to the same row join group using the rowjoingroup option. The row join group contains multiple consecutive rows. All rows in the group will be prevented from being separated into multiple table instances. The rows of a cell spanning these rows dont constitute a join group automatically. Fitbox too low. If the fitbox is too low to hold the required header and footer rows, and at least one body row or row join group the row heights will be decreased uniformly until the table fits into the fitbox. However, if the required shrinking factor is smaller than the limit set in vertshrinklimit, no shrinking will be performed and PDF_fit_table( ) will return the string _error instead, or the respective error message. In order to avoid any shrinking use vertshrinklimit=100%. Fitbox too narrow. The coordinates of the tables fitbox are explicitly supplied in the call to PDF_fit_table( ). If the actual table width as calculated from the sum of the supplied column widths exceeds the tables fitbox, all columns will be reduced until the table fits into the fitbox. However, if the required shrinking factor is smaller than the limit set in horshrinklimit, no shrinking will be performed and PDF_fit_table( ) will return the string _error instead, or the respective error message. In order to avoid any shrinking use horshrinklimit=100%. Splitting a cell. If the last rows spanned by a cell doesnt fit in the fitbox the cell will be split. In case of an image, PDI page or text line cell, the cell contents will be repeated in the next table instance. In case of a Textflow cell, the cell contents will continue in the remaining rows of the cell.
210
Figure 8.38 shows how the Textflow cell will be split while the Textflow continues in the next row. In Figure 8.39, an image cell is shown which will be repeated in the first row of the next table instance.
Our paper planes are the ideal way of passing the time. We offer revolutionary new developments of the traditional common paper planes.
1 Giant Wing
Material
Benefit
It is amazingly robust and can even do aerobatics. But it is best suited to gliding.
Splitting a row. If the last body row doesnt completely fit into the tables fitbox, it will usually not be split. This behaviour is controlled by the minrowheight option of PDF_fit_ table( ) with a default value of 100%. With this default setting the row will not be split but will completely be placed in the next table instance. You can decrease the minrowheight value to split the last body row with the given percentage of contents in the first instance, and place the remaining parts of the row in the next instance. Figure 8.39 illustrates how the Textflow Its amazingly robust... is split and the Textflow is continued in the first body row of the next table instance. The image cell spanning several rows will be split and the image will be repeated. The Benefit text line will be repeated as well.
Fig. 8.39 Splitting a row
Offset print paper 220g/sqm It is amazingly robust and can even do aerobatics. But
1 Giant Wing
table instance 1
Material Benefit
table instance 2
Benefit
211
text the width of the widest character will be used as cell width. For text orientated to west or east twice the text height will be used as cell width. The resulting width and height of the table cell is then distributed evenly among all those columns and rows spanned by the cell for which colwidth or rowheight hasnt been specified. Calculating a tentative table size. In the next step the formatter calculates a tentative table width and height as the sum of all column widths and row heights, respectively. Column widths and row heights specified as percentages are converted to absolute values based on the width and height of the first fitbox. If there are still columns or rows without colwidth or rowheight the remaining space is evenly distributed until the tentative table size equals the first fitbox. In most cases it therefore makes sense to specify at least a minimum rowheight for each table cell since otherwise the table will automatically be adjusted to the height of the fitbox. Enlarging cells which are too small. Now the formatter determines all inner cell boxes (see Figure 8.32). If the combined margins are larger than the cells width or height, the cell box is suitably enlarged by evenly enlarging all columns and rows which belong to the cell. Fitting Textlines horizontally. The formatter attempts to increase the width of all cells with Textline so that the Textline fits into the cell without reducing the font size. If this is not possible, the Textline is automatically placed with fitmethod=auto. This guarantees that the Textline will not extend beyond the inner cell box. You can prevent the cell width from being increased by setting fitmethod=auto in the fittextline option. You can use the colscalegroup option to make sure that all columns which belon to the same column scaling group will be scaled to equal widths, i.e. there widths will be unified and adjusted to the widest column in the group (see Figure 8.35). Avoiding forced hyphenation. If the calculated table width is smaller than the fitbox the formatter tries to increase the width of a Textflow cell so that the text fits without forced hyphenation. This can be avoided with the option checkwordsplitting=false. The widths of such cells will be increased until the table width equals the width of the fitbox. You can query the difference between table width and fitbox width with the horboxgap key of PDF_info_table( ). Fitting text vertically. The formatter attempts to increase the height of all Textline and Textflow cells so that the Textline or Textflow fits into the inner cell box without reducing the font size. However, the cell height will not be increased if for a Textline or Textflow the suboption fitmethod=auto is set, or a Textflow is continued in another cell with the continuetextflow option. This process of increasing the cell height applies only to cells containing a Textline or Textflow, but not for other types of cell contents, i.e. images, PDI pages, path objects, annotations, and fields. You can use the rowscalegroup option to make sure that all rows which belong to the same row scaling group will be scaled to equal heights.
212
Continuing the table in the next fitbox. If the tables resulting total height is larger than the fitbox (i.e. not all table cells fit into the fitbox), the formatter stops placing rows in the fitbox before it encounters the first row which doesnt fit into the fitbox. If a cell spans multiple lines and not all of those lines fit into the fitbox, this cell will be split. If the cell contains an image, PDI page, path object, annotation, form field, or Textline, the cell contents will be repeated in the next fitbox unless repeatcontent=false has been specified. Textflows, however, will be continued in the subsequent rows spanned by the cell (see Figure 8.38). You can use the rowjoingroup option to make sure that all rows belonging to a row joining group will always appear together in a fitbox. All rows which belong to the header or footer plus one body line automatically form a row joining group. The formatter may therefore stop placing table rows before it encounters the first line which doesnt fit into the fitbox (see Figure 8.37). You can use the return option to make sure that now more rows will be placed in the table instance after placing the affected line. Splitting a row. A row may be split if it is very high or if there is only a single body line. If the last body line doesnt fully fit into the tables fitbox, it will completely be moved to the next fitbox. This behavior is controlled by the minrowheight option of PDF_fit_ table( ), which has a default value of 100%. If you reduce the minrowheight value the specified percentage of the content of the last body line will be placed in the current fitbox and the rest of the line in the next fitbox (see Figure 8.39). You can check whether a row has been split with the rowsplit key of PDF_info_table( ). Adjusting the calculated table width. The calculated table width may be larger than the fitbox width after one of the determination steps, e.g. after fitting a Textline horizontally. In this case all column widths will be evenly reduced until the table width equals the width of the fitbox. This shrinking process is limited by the horshrinklimit option. You can query the horizontal shrinking factor with the horshrinking key of PDF_info_ table( ). If the horshrinklimit threshold is exceeded the following error message appears:
Calculated table width $1 is too large (> $2, shrinking $3)")
Here $1 designates the calculated table width, $2 the maximum possible width and $3 the horshrinklimit value. Adjusting the table size to a small fitbox. If the table width which has been calculated for the previous fitbox it too large for the current fitbox, the formatter evenly reduces all columns until the table width equals the width of the current fitbox. The cell contents will not be adjusted, however. In order to calculate the table width anew, call PDF_ fit_table( ) with rewind=1.
213
8.4 Matchboxes
Matchboxes provide access to coordinates calculated by PDFlib as a result of placing some content on the page. Matchboxes are not defined with a dedicated function, but with the matchbox option in the function call which places the actual element, for example PDF_fit_textline( ) and PDF_fit_image( ). Matchboxes can be used for various purposes: > Matchboxes can be decorated, e.g. filled with color or surrounded by a frame. > Matchboxes can be used to automatically create one or more annotations with PDF_ create_annotation( ). > Matchboxes define the height of a text line which will be fit into a box with PDF_fit_ textline( ) or the height of a text fragment in a Textflow which will be decorated (boxheight option). > Matchboxes define the clipping for an image. > The coordinates of the matchbox and other properties can be queried with PDF_info_ matchbox( ) to perform some other task, e.g. insert an image. For each element PDFlib will calculate the matchbox as a rectangle corresponding to the bounding box which describes the position of the element on the page (as specified by all relevant options). For Textflows and table cells a matchbox may consist of multiple rectangles because of line or row breaking. The rectangle(s) of a matchbox will be drawn before drawing the element to be placed. As a result, the element may obscure the effect of the matchbox border or filling, but not vice versa. In particular, those parts of the matchbox which overlap the area covered by an image are hidden by the image. If the image is placed with fitmethod=slice or fitmethod=clip the matchbox borders outside the image fitbox will be clipped as well. To avoid this effect the matchbox rectangle can be drawn using the basic drawing functions, e.g. PDF_rect( ), after the PDF_fit_image( ) call. The coordinates of the matchbox rectangle can be retrieved using PDF_info_matchbox( ) as far as the matchbox has been provided with a name in the PDF_fit_image( ) call. In the following sections some examples for using matchboxes are shown. For details about the functions which support the matchbox option list, see the PDFlib Reference.
You can omit the boxheight option since boxheight={capheight none} is the default setting. It will look better if we increase the box height so that it also covers the descenders using the boxheight option (see Figure 8.40b).
214
To increase the box height to match the font size we can use boxheight={fontsize descender} (see Figure 8.40c). In the next step we extend the matchbox by some offsets to the left, right and bottom to make the distance between text and box margins the same. In addition, we draw a rectangle around the matchbox by specifying the border width (see Figure 8.40d). Cookbook A full code sample can be found in the Cookbook topic text_output/text_on_color.
Fig. 8.40 Decorating a text line using a matchbox with various suboptions Generated output a) b) c) d) Suboptions of the matchbox option of PDF_fit_textline( ) boxheight={capheight none} boxheight={ascender descender} boxheight={fontsize descender} boxheight={fontsize descender} borderwidth=0.3 offsetleft=-2 offsetright=2 offsetbottom=-2
Giant Wing Paper Plane Giant Wing Paper Plane Giant Wing Paper Plane Giant Wing Paper Plane
Adding a Web link to the Textflow matchbox. Now we will add a Web link to parts of a Textflow. In the first step we create the Textflow with a matchbox called kraxi indicating the text part to be linked. Second, we will create the action for opening a URL. Third, we create an annotation of type Link with an invisible frame. In its option list we reference the kraxi matchbox to be used as the links rectangle (the rectangle coordinates in PDF_ create_textflow( ) will be ignored). Cookbook A full code sample can be found in the Cookbook topic text_output/weblink_in_textflow.
/* create and fit Textflow with matchbox "kraxi" */ String tftext = "For more information about the Giant Wing Paper Plane see the Web site of " + "<underline=true matchbox={name=kraxi boxheight={fontsize descender}}>" + "Kraxi Systems, Inc.<matchbox=end underline=false>"; String optlist = "font=" + normalfont + " fontsize=8 leading=110%"; tflow = p.create_textflow(tftext, optlist); if (tflow == -1) throw new Exception("Error: " + p.get_errmsg());
8.4 Matchboxes
215
result = p.fit_textflow(tflow, 0, 0, 50, 70, "fitmethod=auto"); if (!result.equals("_stop")) { /* ... */ } /* create URI action */ optlist = "url={http://www.kraxi.com}"; act = p.create_action("URI", optlist); /* create Link annotation on matchbox "kraxi" */ optlist = "action={activate " + act + "} linewidth=0 usematchbox={kraxi}"; p.create_annotation(0, 0, 0, 0, "Link", optlist);
Even if the text Kraxi Systems, Inc. spans several lines the appropriate number of link annotations will be created automatically with a single call to PDF_create_annotation( ). The result in shown in Figure 8.42.
For information about Giant Wing Paper Planes see the Web site of Kraxi Systems, Inc.
Fig. 8.42 Add Weblinks to parts of a Textflow
Cookbook A full code sample can be found in the Cookbook topic interactive/link_annotations. Drawing a frame around an image. In this example we want to use the image matchbox to draw a frame around the image. We completely fit the image into the supplied box while maintaining its proportions using fitmethod=meet. We use the matchbox option with the borderwidth suboption to draw a thick border around the image. The strokecolor suboption determines the border color, and the linecap and linejoin suboptions are used to round the corners. Cookbook A full code sample can be found in the Cookbook topic images/frame_around_image. The matchbox is always drawn before the image which means it would be partially hidden by the image. To avoid this we use the offset suboptions with 50 percent of the border width to enlarge the frame beyond the area covered by the image. Alternatively, we could increase the border width accordingly. Figure 8.43 shows the option list used with PDF_fit_image( ) to draw the frame.
216
Fig. 8.43 Using the image matchbox to draw a frame around the image Generated output Option list for PDF_fit_image( )
boxsize={60 60} position={center} fitmethod=meet matchbox={name=kraxi borderwidth=4 offsetleft=-2 offsetright=2 offsetbottom=-2 offsettop=2 linecap=round linejoin=round strokecolor {rgb 0.0 0.3 0.3}}
Align text at an image. The following code fragment shows how to align vertical text at the right margin of an image. The image is proportionally fit into the supplied box with a fit method of meet. The actual coordinates of the fitbox are retrieved with PDF_ info_matchbox( ) and a vertical text line is placed relative to the lower right (x2, y2) corner of the fitbox. The border of the matchbox is stroked (see Figure 8.44). Cookbook A full code sample can be found in the Cookbook topic images/align_text_at_image.
/* use this option list to load and fit the image */ String optlist = "boxsize={300 200} position={center} fitmethod=meet " + "matchbox={name=giantwing borderwidth=3 strokecolor={rgb 0.85 0.83 0.85}}"; /* load and fit the image */ ... /* retrieve the coordinates of the lower right (second) matchbox corner */ if ((int) p.info_matchbox("giantwing", 1, "exists") == 1) { x1 = p.info_matchbox("giantwing", 1, "x2"); y1 = p.info_matchbox("giantwing", 1, "y2"); } /* start the text line at that corner with a small distance of 2 */ p.fit_textline("Foto: Kraxi", x2+2, y2+2, "font=" + font + " fontsize=8 orientate=west"); Fig. 8.44 Use the coordinates of the image matchbox to fit a text line Generated output Generation steps Step 1: Fit image with matchbox
Foto: Kraxi
Step 2: Retrieve matchbox info for coordinates (x2, y2) Step 3: Fit text line starting at retrieved coordinates (x2, y2) with option orientate=west
(x2, y2)
8.4 Matchboxes
217
218
Number of pages.
objtype = p.pcos_get_string(doc, "type:/Info/Title"); if (objtype.equals("string")) { /* Document info key found */ title = p.pcos_get_string(doc, "/Info/Title"); }
Page size. Although the MediaBox, CropBox, and Rotate entries of a page can directly be obtained via pCOS, they must be evaluated in combination in order to find the actual
219
size of a page. Determining the page size is much easier with the width and height keys of the pages pseudo object. The following code retrieves the width and height of page 3 (note that indices for the pages pseudo object start at 0):
pagenum = 2 width = p.pcos_get_number(doc, "pages[" + pagenum + "]/width"); height = p.pcos_get_number(doc, "pages[" + pagenum + "]/height");
Listing all fonts in a document. The following sequence creates a list of all fonts in a document along with their embedding status:
fontcount = p.pcos_get_number(doc, "length:fonts"); for (i=0; i < fontcount; i++) { fontname = p.pcos_get_string(doc, "fonts[" + i + "]/name"); embedded = p.pcos_get_number(doc, "fonts[" + i + "]/embedded"); }
Writing mode. Using pCOS and the fontid value provided in the char_info structure you can easily check whether a font uses vertical writing mode:
if (p.pcos_get_number(doc, "fonts[" + ci->fontid + "]/vertical")) { /* font uses vertical writing mode */ }
Encryption status. You can query the pcosmode pseudo object to determine the pCOS mode for the document:
if (p.pcos_get_number(doc, "pcosmode") == 2) { /* full pCOS mode */ }
XMP meta data. A stream containing XMP meta data can be retrieved with the following code sequence:
objtype = p.pcos_get_string(doc, "type:/Root/Metadata"); if (objtype.equals("stream")) { /* XMP meta data found */ metadata = p.pcos_get_stream(doc, "", "/Root/Metadata"); }
220
221
Streams in PDF generally contain binary data. However, in rare cases (text streams) they may contain textual data instead (e.g. JavaScript streams). In order to trigger the appropriate text conversion, use the convert=unicode option in PDF_pcos_get_stream( ).
222
223
The meaning of the various path components is as follows: > The optional prefix can attain the values listed in Table 9.2. > The optional pseudo object name may contain one of the values described in Section 9.5, Pseudo Objects, page 226. > The name components are dictionary keys found in the document. Multiple names are separated with a / character. An empty path, i.e. a single / denotes the documents Trailer dictionary. Each name must be a dictionary key present in the preceding dictionary. Full paths describe the chain of dictionary keys from the initial dictionary (which may be the Trailer or a pseudo object) to the target object. > Paths or path components specifying an array or dictionary can have a numerical index which must be specified in decimal format between brackets. Nested arrays or dictionaries can be addressed with multiple index entries. The first entry in an array or dictionary has index 0. > Paths or path components specifying a dictionary can have an index qualifier plus one of the suffixes .key or .val. This can be used to retrieve a particular dictionary key or the corresponding value of the indexed dictionary entry, respectively. If a path for a dictionary has an index qualifier it must be followed by one of these suffixes. Encoding for pCOS paths. In most cases pCOS paths will contain only plain ASCII characters. However, in a few cases (e.g. PDFlib Block names) non-ASCII characters may be required. pCOS paths must be encoded according to the following rules: > When a path component contains any of the characters /, [, ], or #, these must be expressed by a number sign # followed by a two-digit hexadecimal number. > In Unicode-aware language bindings the path consists of a regular Unicode string which may contain ASCII and non-ASCII characters. > In non-Unicode-aware language bindings the path must be supplied in UTF-8. The string may or may not contain a BOM, but this doesn't make any difference. A BOM may be placed at the start of the path, or at the start of individual path components (i.e. after a slash character). On EBCDIC systems the path must generally be supplied in ebcdic encoding. Characters outside the ASCII character set must be supplied as EBCDIC-UTF-8 (with or without BOM).
224
Path prefixes. Prefixes can be used to query various attributes of an object (as opposed to its actual value). Table 9.2 lists all supported prefixes. The length prefix and content enumeration via indices are only applicable to plain PDF objects and pseudo objects of type array, but not any other pseudo objects. The pcosid prefix cannot be applied to pseudo objects. The type prefix is supported for all pseudo objects.
Table 9.2 pCOS path prefixes prefix length explanation (Number) Length of an object, which depends on the objects type: array dict stream fstream other pcosid Number of elements in the array Number of key/value pairs in the dictionary Number of key/value pairs in the stream dict (not the stream length; use the Length key to determine the length of stream data in bytes) Same as stream 0
(Number) Unique pCOS ID for an object of type dictionary or array. If the path describes an object which doesnt exist in the PDF the result will be -1. This can be used to check for the existence of an object, and at the same time obtaining an ID if it exists.
type
(String or number) Type of the object as number or string: 0, null Null object or object not present (use to check existence of an object) 1, boolean Boolean object 2, number Integer or real number 3, name 4, string 5, array 6, dict Name object String object Array object Dictionary object (but not stream)
7, stream Stream object which uses only supported filters 8, fstream Stream object which uses one or more unsupported filters Enums for these types are available for the convenience of C and C++ developers.
225
noaccessible, noannots, noassemble, nocopy, noforms, nohiresprint, nomodify, noprint (Boolean) True if the respective access protection is set, false otherwise plainmetadata (Boolean) True if the PDF contains unencrypted meta data, false otherwise extensionlevel filename filesize fullpdfversion (String) Adobe Extension Level based on ISO 32000, or 0 if no extension level is present. Acrobat 9 creates documents with extension level 3. (String) Name of the PDF file. (Number) Size of the PDF file in bytes (Number) Numerical value for the PDF version number. The numbers increase monotonically for each PDF/Acrobat version. The value 100 * BaseVersion + ExtensionLevel will be returned, e.g. 150 160 170 173 linearized PDF 1.5 (Acrobat 6) PDF 1.6 (Acrobat 7) PDF 1.7 (Acrobat 8) PDF 1.7 Adobe Extension Level 3 (Acrobat 9)
226
Table 9.3 Universal pseudo objects object name major minor revision pcosinterface explanation (Number) Major, minor, or revision number of the library, respectively.
(Number) Interface number of the underlying pCOS implementation. This specification describes interface number 5. The following table details which product versions implement various pCOS interface numbers: 1 2 3 4 5 TET 2.0, 2.1 pCOS 1.0 PDFlib+PDI 7, PPS 7, TET 2.2, pCOS 2.0, PLOP 3.0, TET 2.3 PLOP 4.0, TET 3.0 PDFlib+PDI 8, PPS 8
pcosmode pcosmodename
(Number) PDF version number multiplied by 10, e.g. 16 for PDF 1.6 (String) Full PDF version string in the form expected by various API functions for setting the PDF output compatibility, e.g. 1.5, 1.6, 1.7, 1.7ext3 (Boolean) True if and only if security settings were ignored when opening the PDF document; the client must take care of honoring the document authors intentions. For TET the value will be true, and content extraction will be allowed, if all of the following conditions are true: > Shrug mode has been enabled with the shrug option.
> The document has a master password but this has not been supplied. > The user password (if required for the document) has been supplied. > Content extraction is not allowed in the documents permission settings.
version (String) Full library version string in the format <major>.<minor>.<revision>, possibly suffixed with additional qualifiers such as beta, rc, etc.
227
Pseudo objects for PDF objects, pages, and interactive elements. Table 9.4 lists pseudo objects which can be used for retrieving object or page information, or serve as shortcuts for various interactive elements.
Table 9.4 Pseudo objects for PDF objects, pages, and interactive elements object name articles explanation (Array of dicts) Array containing the article thread dictionaries for the document. The array will have length 0 if the document does not contain any article threads. In addition to the standard PDF keys pCOS supports the following pseudo key for dictionaries in the articles array: beads bookmarks (Array of dicts) Bead directory with the standard PDF keys, plus the following: destpage (Number) Number of the target page (first page is 1)
(Array of dicts) Array containing the bookmark (outlines) dictionaries for the document. In addition to the standard PDF keys pCOS supports the following pseudo keys for dictionaries in the bookmarks array: level (Number) Indentation level in the bookmark hierarchy destpage (Number) Number of the target page (first page is 1) if the bookmark points to a page in the same document, -1 otherwise.
fields
(Array of dicts) Array containing the form fields dictionaries for the document. In addition to the standard PDF keys in the field dictionary and the entries in the associated Widget annotation dictionary pCOS supports the following pseudo keys for dictionaries in the fields array: level (Number) Level in the field hierarchy (determined by . as separator) fullname (String) Complete name of the form field. The same naming conventions as in Acrobat 7 will be applied.
names
(Dict) A dictionary where each entry provides simple access to a name tree. The following name trees are supported: AP, AlternatePresentations, Dests, EmbeddedFiles, IDS, JavaScript, Pages, Renditions, Templates, URLS. Each name tree can be accessed by using the name as a key to retrieve the corresponding value, e.g.: names/Dests[0].key retrieves the name of a destination names/Dests[0].val retrieves the corresponding destination dictionary In addition to standard PDF dictionary entries the following pseudo keys for dictionaries in the Dests names tree are supported: destpage (number) Number of the target page (first page is 1) if the destination points to a page in the same document, -1 otherwise. In order to retrieve other name tree entries these must be queried directly via /Root/Names/Dests etc. since they are not present in the name tree pseudo objects.
objects
(Array) Address an element for which a pCOS ID has been retrieved earlier using the pcosid prefix. The ID must be supplied as array index in decimal form; as a result, the PDF object with the supplied ID will be addressed. The length prefix cannot be used with this array.
228
Table 9.4 Pseudo objects for PDF objects, pages, and interactive elements object name pages explanation (Array of dicts) Each array element addresses a page of the document. Indexing it with the decimal representation of the page number minus one addresses that page (the first page has index 0). Using the length prefix the number of pages in the document can be determined. A page object addressed this way will incorporate all attributes which are inherited via the /Pages tree. The entries /MediaBox and / Rotate are guaranteed to be present. In addition to standard PDF dictionary entries the following pseudo entries are available for each page: colorspaces, extgstates, fonts, images, patterns, properties, shadings, templates (Arrays of dicts) Page resources according to Table 9.5. annots (Array of dicts) In addition to the standard PDF keys in the Annots array pCOS supports the following pseudo key for dictionaries in the annots array: destpage (Number; only for Subtype=Link and if a Dest entry is present) Number of the target page (first page is 1) (Array of dicts) Shorthand for pages[ ]/PieceInfo/PDFlib/Private/Blocks[ ], i.e. the pages block dictionary. In addition to the existing PDF keys pCOS supports the following pseudo key for dictionaries in the blocks array: rect (Rectangle) Similar to Rect, except that it takes into account any relevant CropBox/MediaBox and Rotate entries and normalizes coordinate ordering. (Number) Height of the page. The MediaBox or the CropBox (if present) will be used to determine the height. Rotate entries will also be applied. (Boolean) True if the page is empty, and false if the page is not empty (String) The page label of the page (including any prefix which may be present). Labels will be displayed as in Acrobat. If no label is present (or the PageLabel dictionary is malformed), the string will contain the decimal page number. Roman numbers will be created in Acrobats style (e.g. VL), not in classical style which is different (e.g. XLV). If /Root/PageLabels doesnt exist, the document doesnt contain any page labels. (Number) Width of the page (same rules as for height)
blocks
width
The following entries will be inherited: CropBox, MediaBox, Resources, Rotate. pdfa pdfe pdfx (String) PDF/A conformance level of the document. Possible values are none, PDF/A-1a:2005, PDF/A1b:2005, PDF/A-2a, PDF/A-2b, PDF/A-2u. (String) PDF/E conformance level of the document. Possible values are none and PDF/E-1. (String) PDF/X conformance level of the document. Possible values are none and the following: PDF/X-1:2001, PDF/X-1a:2001, PDF/X-1a:2003, PDF/X-2:2003, PDF/X-3:2002, PDF/X-3:2003, PDF/X-4, PDF/X-4p, PDF/X-5g, PDF/X-5n, PDF/X-5p tagged (Boolean) True if the PDF document is tagged, false otherwise
229
Pseudo objects for simplified resource handling. Resources are a key concept for managing various kinds of data which are required for completely describing the contents of a page. The resource concept in PDF is very powerful and efficient, but complicates access with various technical concepts, such as recursion and resource inheritance. pCOS greatly simplifies resource retrieval and supplies several groups of pseudo objects which can be used to directly query resources. Some of these pseudo resource dictionaries contain entries in addition to the standard PDF keys in order to further simplify resource information retrieval. pCOS pseudo resources reflect resources from the users point of view, and differ from native PDF resources: > Some entries may have been added (e.g. inline images, simple color spaces) or deleted (e.g. listed fonts which are not used on any page). > In addition to the original PDF dictionary keys resource dictionaries may contain some user-friendly keys for auxiliary information (e.g. embedding status of a font, number of components of a color space). pCOS supports two groups of pseudo objects for resource retrieval. Global resource arrays contain all resources of a given type in a PDF document, while page-based resources contain only the resources used by a particular page. The corresponding pseudo arrays are available for all resource types listed in Table 9.5: > A list of all resources in the document is available in the global resource array (e.g. images[ ]). Retrieving the length of one of the global resource pseudo arrays results in a resource scan (see below) for all pages. > A list of resources on each page is available in the page-based resource array (e.g. pages[ ]/images[ ]). Accessing the length of one of a pages resource pseudo arrays results in a resource scan for that page (to collect all resources which are actually used on the page, and to merge images on that page). A resource scan is a full scan of the page including Unicode mapping and image merging, but excluding Wordfinder operation. Applications which require a full resource listing and all page contents are recommended to process all pages before querying resource pseudo objects in order to avoid the resource scans in addition to the regular page scans.
230
Table 9.5 Pseudo objects for resources; each resource category P creates two resource arrays P[ ] and pages[ ]/P[ ]. object name colorspaces explanation (Array of dicts) Array containing dictionaries for all color spaces on the page or in the document. In addition to the standard PDF keys in color space and ICC profile stream dictionaries the following pseudo keys are supported: alternateid (Integer; only for name=Separation and DeviceN) Index of the underlying alternate color space in the colorspaces[] pseudo object. baseid (Integer; only for name=Indexed) Index of the underlying base color space in the colorspaces[] pseudo object.
colorantname (Name; only for name=Separation) Name of the colorant. Non-ASCII CJK color names will be converted to Unicode. colorantnames (Array of names; only for name=DeviceN) Names of the colorants components (Integer) Number of components of the color space name csarray (String) Name of the color space: CalGray, CalRGB, DeviceCMYK, DeviceGray, DeviceN, DeviceRGB, ICCBased, Indexed, Lab, Separation (Array; not for name=DeviceGray/RGB/CMYK) Array describing the underlying native color space, i.e. the original color space object in the PDF.
Color space resources will include all color spaces which are referenced from any type of object, including the color spaces which do not require native PDF resources (i.e. DeviceGray, DeviceRGB, and DeviceCMYK). extgstates fonts (Array of dicts) Array containing the dictionaries for all extended graphics states (ExtGStates) on the page or in the document (Array of dicts) Array containing dictionaries for all fonts on the page or in the document. In addition to the standard PDF keys in font dictionaries, the following pseudo keys are supported: name (String) PDF name of the font without any subset prefix. Non-ASCII CJK font names will be converted to Unicode. (String) Font type: (unknown), Composite, Multiple Master, OpenType, TrueType, TrueType (CID), Type 1, Type 1 (CID), Type 1 CFF, Type 1 CFF (CID), Type 3 (Boolean) true for fonts with vertical writing mode, false otherwise
231
Table 9.5 Pseudo objects for resources; each resource category P creates two resource arrays P[ ] and pages[ ]/P[ ]. object name images explanation (Array of dicts) Array containing dictionaries for all images on the page or in the document. The TET product will add merged (artificial) images to the images[ ] array. In addition to the standard PDF keys the following pseudo keys are supported: bpc (Integer) The number of bits per component. This entry is usually the same as BitsPerComponent, but unlike this it is guaranteed to be available. For JPEG2000 images it may be -1 since the number of bits per component may not be available in the PDF structures.
colorspaceid (Integer) Index of the images color space in the colorspaces[] pseudo object. This can be used to retrieve detailed color space properties. For JPEG 2000 images the color space id may be -1 since the color space may not be encoded in the PDF structures. filterinfo (Dict) Describes the remaining filter for streams with unsupported filters or when retrieving stream data with the keepfilter option set to true. If there is no such filter no filterinfo dictionary will be available. The dictionary contains the following entries: name (Name) Name of the filter supported (Boolean) True if the filter is supported decodeparms (Dict) The DecodeParms dictionary if one is present for the filter
mergetype (Integer) The following types describe the status of the image: 0 (normal) The image corresponds to an image in the PDF. 1 (artificial) The image is the result of merging multiple consumed images (i.e. images with mergetype=2) into a single image. The resulting artificial image does not exist in the PDF data structures as an object. 2 (consumed) The image should be ignored since it has been merged into a larger image. Although the image exists in the PDF, it usually should not be extracted because it is part of an artificial image (i.e. an image with mergetype=1). This entry reflects information regarding all pages processed so far. It may change its value while processing other pages in the document. If final (constant) information is required, all pages in the document must have been processed, or the value of the pCOS path length:images must have been retrieved. patterns properties shadings (Array of dicts) Array containing dictionaries for all patterns on the page or in the document (Array of dicts) Array containing dictionaries for all properties on the page or in the document (Array of dicts) Array containing dictionaries for all shadings on the page or in the document. In addition to the standard PDF keys in shading dictionaries the following pseudo key is supported: colorspaceid (Integer) Index of the underlying color space in the colorspaces[] pseudo object. templates (Array of dicts) Array containing dictionaries for all templates (Form XObjects) on the page or in the document
232
233
234
Features which require PDF 1.7 extension level 3 (Acrobat 9) Geospatial PDF PDF_begin_document( ): option viewports PDF_load_image( ): option georeference PDF portfolios PDF_begin_document( ): option portfolio PDF_add_portfolio_folder( ) AES encryption with 256-bit keys PDF_begin_document( ): AES encryption with 256-bit will automatically be used with compatibility=1.7ext3 when the masterpassword, userpassword, attachmentpassword, or permissions option is supplied
Features which require PDF 1.6 (Acrobat 7) user units print scaling document open mode AES encryption with 128-bit keys PDF_begin/end_document( ): option userunit PDF_begin/end_document( ): suboption printscaling for viewerpreferences option PDF_begin/end_document( ): option openmode=attachments PDF_begin_document( ): AES encryption will automatically be used with compatibility=1.6 or 1.7 when the masterpassword, userpassword, attachmentpassword, or permissions option is supplied PDF_begin/end_document( ): option attachmentpassword PDF_begin/end_document( ): suboption description for option attachments PDF_load_3ddata( ), PDF_create_3dview( ); PDF_create_annotation( ): type=3D
235
Table 10.1 PDFlib features which require a specific PDF compatibility mode Feature PDFlib API functions and options
Features which require PDF 1.5 (Acrobat 6) certain field options page layout certain annotation options extended permission settings certain CMaps for CJK fonts Tagged PDF Layers JPEG2000 images compressed object streams PDF_create_field( ) and PDF_create_fieldgroup( ) PDF_begin/end_document( ): option pagelayout=twopageleft/right PDF_create_annotation( ) permissions=plainmetadata in PDF_begin_document( ), see Table 10.2 PDF_load_font( ), see Table 4.7 certain options for PDF_begin_item( ); PDF_begin/end_page_ext( ): option taborder PDF_define_layer( ), PDF_begin_layer( ), PDF_end_layer( ), PDF_layer_ dependency( ) imagetype=jpeg2000 in PDF_load_image( ) compressed object streams will automatically be generated with compatibility=1.5 or above unless objectstreams=none has been set in PDF_ begin_document( )
Features which require PDF 1.4 (Acrobat 5) smooth shadings (color blends) soft masks JBIG2 images 128-bit encryption extended permission settings certain CMaps for CJK fonts transparency and other graphics state options certain options for actions certain options for annotations certain field options Tagged PDF Referenced PDF PDF_shading_pattern( ), PDF_shfill( ), PDF_shading( ) PDF_load_image( ) with the masked option referring to an image with more than 1 bit pixel depth imagetype=jbig2 in PDF_load_image( ) PDF_begin_document( ) with the userpassword, masterpassword, permissions options PDF_begin_document( ) with permissions option, see Table 10.2 PDF_load_font( ), see Table 4.7 PDF_create_gstate( ) with options alphaisshape, blendmode, opacityfill, opacitystroke, textknockout PDF_create_action( ) PDF_create_annotation( ) PDF_create_field( ) and PDF_create_fieldgroup( ) tagged option in PDF_begin_document( ) reference option in PDF_open_pdi_page( ) and PDF_begin_template_ext( ) (however, note that this feature requires Acrobat 9 for proper display/printing)
PDF version of documents imported with PDI. In all compatibility modes only PDF documents with a compatible PDF version can be imported with PDI. If you must import a PDF with a newer PDF version you must set the compatibility option accordingly (see Section 7.2.3, Acceptable PDF Documents, page 162). As an exception, documents according to PDF 1.7 extension level 3 can be imported into PDF 1.7 documents. Changing the PDF version of a document. If you must create output according to a particular PDF version, but need to import PDFs which use a higher PDF version you
236
must convert the documents to the desired lower PDF version before you can import them with PDI. You can use the menu item Advanced, PDF Optimizer, Make compatible with in Acrobat 7/8/9 Professional to change the PDF version as follows: > Acrobat 7: PDF 1.3 - PDF 1.6 > Acrobat 8: PDF 1.3 - PDF 1.7 > Acrobat 9: PDF 1.3 - PDF 1.7 extension level 3
237
238
There is nothing inherent in PDF encryption that enforces the document permissions specified in the encryption dictionary. It is up to the implementors of PDF viewers to respect the intent of the document creator by restricting user access to an encrypted PDF file according to the permissions contained in the file.
239
Unicode passwords. PDF 1.7 extension level 3 (Acrobat 9) supports Unicode passwords: arbitrary Unicode characters can be used within user or master passwords. Permissions. Access restrictions can be set with the permissions option in PDF_begin_ document( ). It contains one or more access restriction keywords. When setting the permissions option the masterpassword option must also be set, because otherwise Acrobat users could easily remove the permission settings. By default, all actions are allowed. Specifying an access restriction will disable the respective feature in Acrobat. Access restrictions can be applied without any user password. Multiple restriction keywords can be specified as in the following example:
p.begin_document(filename, "masterpassword=abc123 permissions={noprint nocopy}");
Table 10.2 lists all supported access restriction keywords. Cookbook A full code sample can be found in the Cookbook topic general/permission_settings.
Table 10.2 Access restriction keywords for the permissions option in PDF_begin_document( ) keyword noprint nomodify nocopy noannots noforms noaccessible noassemble nohiresprint plainmetadata explanation Acrobat will prevent printing the file. Acrobat will prevent editing or cropping pages and creating or changing form fields. Acrobat will prevent copying and extracting text or graphics; the accessibility interface will be controlled by noaccessible. Acrobat will prevent creating or changing annotations and form fields. (PDF 1.4; implies nomodify and noannots) Acrobat will prevent form field filling. (PDF 1.4) Acrobat will prevent extracting text or graphics for accessibility purposes (such as a screenreader program). (PDF 1.4; implies nomodify) Acrobat will prevent inserting, deleting, or rotating pages and creating bookmarks and thumbnails. (PDF 1.4) Acrobat will prevent high-resolution printing. If noprint isnt set, printing is restricted to the print as image feature which prints a low-resolution rendition of the page. (PDF 1.5) Keep XMP document metadata unencrypted even for encrypted documents.
Note When serving PDFs over the Web, clients can always produce a local copy of the document with their browser. There is no way for a PDF to prevent users from saving a local copy. Encrypted file attachments. In PDF 1.6 and above file attachments can be encrypted even in otherwise unprotected documents. This can be achieved by supplying the attachmentpassword option to PDF_begin_document( ).
240
In Acrobat you can check whether a file is linearized by looking at its document properties (Fast Web View: yes). > The Web server must support byteserving. The underlying byterange protocol is part of HTTP 1.1 and therefore implemented in all current Web servers. > The user must use Acrobat as a Browser plugin, and have page-at-a-time download enabled in Acrobat (Edit, Preferences, [General...,] Internet, Allow fast web view). Note that this is enabled by default. The larger a PDF file (measured in pages or MB), the more it will benefit from linearization when delivered over the Web. Note Linearizing a PDF document generally slightly increases its file size due to the additional linearization information. Temporary storage requirements for linearization. PDFlib must create the full document before it can be linearized; the linearization process will be applied in a separate step after the document has been created. For this reason PDFlib has additional storage requirements for linearization. Temporary storage will be required which has roughly the same size as the generated document (without linearization). Subject to the inmemory option in PDF_begin_document( ) PDFlib will place the linearization data either in memory or on a temporary disk file.
241
242
Note Due to a problem in Acrobat Preflight PDF/X validation fails in Acrobat versions up to 9.1.3 for PDF/X-5g if external pages are referenced.
> The BleedBox, if present, must fully contain the ArtBox and TrimBox. > The CropBox, if present, must fully contain the ArtBox and TrimBox.
grayscale color PDF/X-3/4/5: Grayscale images and PDF_setcolor( ) with a gray color space can only be used if the output condition is a grayscale or CMYK device, or if the defaultgray option in PDF_begin_page_ ext( ) has been set. PDF/X-3/4/5: RGB images and PDF_setcolor( ) with an RGB color space can only be used if the output condition is an RGB device, or the defaultrgb option in PDF_begin_page_ext( ) has been set.
RGB color
243
Table 10.3 Operations which must be applied for PDF/X compatibility item CMYK color PDFlib function and option requirements for PDF/X compatibility PDF/X-3/4/5: CMYK images and PDF_setcolor( ) with a CMYK color space can only be used if the output condition is a CMYK device, or the defaultcmyk option in PDF_begin_page_ext( ) has been set. The Creator and Title info keys must be set to a non-empty value with PDF_set_info( ) or (in PDF/X-4 and PDF/X-5) with the xmp:CreatorTool and dc:title XMP properties in the metadata option of PDF_begin/end_document( )
Prohibited operations. Table 10.4 lists all operations which are prohibited when generating PDF/X-conforming output. The items apply to all PDF/X conformance levels unless otherwise noted. Calling one of the prohibited functions while in PDF/X mode will trigger an exception. Similarly, if an imported PDF page doesnt match the current PDF/ X conformance level, the corresponding PDI call will fail.
Table 10.4 Operations which must be avoided or are restricted to achieve PDF/X compatibility item grayscale color RGB color CMYK color ICC-based color Lab color annotations and form fields actions and JavaScript images Prohibited or restricted PDFlib functions and options for PDF/X compatibility PDF/X-1a: the defaultgray option in PDF_begin_page_ext( ) must be avoided. PDF/X-1a: RGB images and the defaultrgb option in PDF_begin_page_ext( ) must be avoided. PDF/X-1a: the defaultcmyk option in PDF_begin_page_ext( ) must be avoided. PDF/X-1a: the iccbasedgray/rgb/cmyk color space in PDF_setcolor( ) and the setcolor:iccprofilegray/rgb/cmyk parameters must be avoided. PDF/X-1a: the Lab color space in PDF_setcolor( ) must be avoided. Annotations inside the BleedBox (or TrimBox/ArtBox if no BleedBox is present) must be avoided: PDF_create_annotation( ), PDF_create_field( ). All actions including JavaScript must be avoided: PDF_create_action( ) PDF/X-1a: images with RGB, ICC-based, YCbCr, or Lab color must be avoided. For colorized images the alternate color of the spot color used must satisfy the same conditions. PDF/X-1 and PDF/X-3: JBIG2 images must be avoided. The OPI-1.3 and OPI-2.0 options in PDF_load_image( ) must be avoided. transparent images and graphics PDF/X-1 and PDF/X-3: Soft masks for images must be avoided: the masked option for PDF_load_ image( ) must be avoided unless the mask refers to a 1-bit image. Images with implicit transparency (alpha channel) are not allowed; they must be loaded with the ignoremask option of PDF_ load_image( ). The opacityfill and opacitystroke options for PDF_create_gstate( ) must be avoided unless they have a value of 1. Transparent images and graphics are allowed in PDF/X-4 and PDF/X-5.
244
Table 10.4 Operations which must be avoided or are restricted to achieve PDF/X compatibility item Prohibited or restricted PDFlib functions and options for PDF/X compatibility
transparency groups The transparencygroup option of PDF_begin/end_page_ext( ), PDF_begin_template_ext( ), and PDF_open_pdi_page( ) is not allowed in PDF/X-1 and PDF/X-3, but only in PDF/X-4 and PDF/X-5. If transparencygroup is used, the values of the CS suboption are subject to the following requirements: > DeviceGray: the PDF/X output condition must be a grayscale or CMYK device. For the generated page (but not for templates and imported pages) the defaultgray option in PDF_begin_ page_ext( ) can be set as an alternative.
> DeviceRGB: the PDF/X output condition must be an RGB device. For the generated page (but
not for templates and imported pages) the defaultrgb option in PDF_begin_page_ext( ) can be set as an alternative.
> DeviceCMYK: the PDF/X output condition must be a CMYK device. For the generated page (but
not for templates and imported pages) the defaultcmyk option in PDF_begin_page_ext( ) can be set as an alternative. viewer preferences / When the viewarea, viewclip, printarea, and printclip suboptions for the viewerview and print areas preferences option in PDF_begin/end_document( ) are used values other than media or bleed are not allowed. document info keys security PDF version / compatibility Values other than True or False for the Trapped info key or the corresponding XMP property pdf:Trapped PDF_set_info( ) must be avoided. The userpassword, masterpassword, and permissions options in PDF_begin_document( ) must be avoided. PDF/X-1a:2001 and PDF/X-3:2002 are based on PDF 1.3. Operations that require PDF 1.4 or above (such as transparency or soft masks) must be avoided. PDF/X-1a:2003 and PDF/X-3:2003 are based on PDF 1.4. Operations that require PDF 1.5 or above must be avoided. PDF/X-4 and PDF/X-5 are based on PDF 1.6. Operations that require PDF 1.7 or above must be avoided. PDF import (PDI) external graphical content (references) Imported documents must conform to a compatible PDF/X level according to Table 10.6, and must have been prepared according to the same output intent. PDF/X-1/3/4: The reference option in PDF_begin_template_ext( ) and PDF_open_pdi_page( ) must be avoided. PDF/X-5g and PDF/X-5pg: the target provided in the reference option in PDF_begin_template_ ext( ) and PDF_open_pdi_page( ) must conform to one of the following standards: PDF/X1a:2003, PDF/X-3:2002, PDF/X-4, PDF/X-4p, PDF/X-5g, or PDF/X-5pg. Since certain XMP metadata entries are required in the target, not all PDF/X documents are acceptable as target. PDF/X documents generated with PDFlib 8 can be used as target. See Section 3.2.5, Referenced Pages from external PDF Documents, page 65, for more details on the reference option and the required Acrobat configuration. layers PDF/X-1 and PDF/X-3: layers require PDF 1.5 and can therefore not be used. PDF/X-4 and PDF/X-5: layers can be used but, certain PDF/X rules must be obeyed:
> Layer visibility must be controlled with document variants instead of giving the viewer control
over individual layers. Variants can be created with PDF_set_layer_dependency( ), the parameter type=variant and various options for variants.
245
Output intent and standard output conditions. The output condition defines the intended target device, which is mainly useful for reliable proofing. The output intent can be specified in one of the following ways: > PDF/X-1/3/4 and PDF/X-5g: by embedding an ICC profile for the output intent. > PDF/X-1 and PDF/X-3: by supplying the name of a standard output intent. The standard output intents are known internally to PDFlib; see PDFlib Reference for a complete list of the standard output intent names and a description of the corresponding printing conditions. ICC profiles for these output intents are not required to be available locally. Additional standard output intents can be defined using the StandardOutputIntent resource category (see Section 3.1.3, Resource Configuration and File Searching, page 52). It is the users responsibility to add only those names as standard output intents which will be recognized by PDF/X-processing software. Standard output intents can be referenced as follows:
if (p.load_iccprofile("CGATS TR 001", "usage=outputintent") == -1) { /* Error */ }
When creating PDF/X-3 output and using any of HKS, PANTONE, ICC-based, or Lab colors referencing the name of standard output intents is not sufficient, but an ICC profile of the output device must be embedded. > PDF/X-4p and PDF/X-5pg: by referencing an external ICC profile for the output intent (the p in the name of the standard means that an external ICC profile is referenced). Unlike standard output intents, the output intent ICC profile is not only referenced by name, but a strong reference is created which requires the ICC profile to be locally available when the document is generated. Although the ICC profile will not be embedded in the PDF output, it must nevertheless be available at PDF creation time to create a strong reference. The urls option must be provided with one or more valid URLs where the ICC profile can be found:
if (p.load_iccprofile("CGATS TR 001", "usage=outputintent urls={http://www.color.org}") == -1) { /* Error */ }
Selecting a suitable PDF/X output intent. The PDF/X output intent is usually selected as a result of discussions between you and your print service provider who will take care of print production. If your printer cannot provide any information regarding the choice of output intent, you can use the standard output intents listed in Table 10.5 as a starting point (taken from the PDF/X FAQ).
Table 10.5 Suitable PDF/X output intents for common printing situations Europe Magazine ads Newsprint ads FOGRA28 IFRA26 North America CGATS TR 00 IFRA30
246
Table 10.5 Suitable PDF/X output intents for common printing situations Europe Sheet-fed offset Dependent on paper stock: Types 1 & 2 (coated): FOGRA27 Type 3 (LWC): FOGRA28 Type 4 (uncoated): FOGRA29 Web-fed offset Dependent on paper stock: Type 1 & 2 (coated): FOGRA28 Type 4 (uncoated, white): FOGRA29 Type 5 (uncoated, yellowish): FOGRA30 North America Dependent on paper stock: Grades 1 and 2 (premium coated): FOGRA27 Grade 5: CGATS TR 001 Uncoated: FOGRA29 Dependent on paper stock: Grade 5: CGATS TR 001 Uncoated (white): FOGRA29 Uncoated (yellowish): FOGRA30
PDF/X output level PDF/X-1a:2001 PDF/X-1a:2003 PDF/X-3:2002 PDF/X-3:2003 PDF/X-4 PDF/X-4p PDF/X-5g PDF/X-5pg
allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed allowed1 allowed allowed1 allowed2 allowed2 allowed2 allowed1,2
1. PDF_process_pdi( ) with action=copyoutputintent will copy the reference to the external output intent ICC profile. 2. If the imported page contains referenced XObjects, PDF_open_pdi_page( ) will copy both proxy and reference to the target.
PDF/X-5pg 247
PDF/X-4p
PDF/X-5g
PDF/X-4
If multiple PDF/X documents are imported, they must all have been prepared for the same output condition. For example, only documents with a CMYK output intent can be imported into a document which uses the same CMYK output intent. While PDFlib can correct certain items, it is not intended to work as a full PDF/X validator or to enforce full PDF/X compatibility for imported documents. For example, PDFlib will not embed fonts which are missing from imported PDF pages, and does not apply any color correction to imported pages. If you want to combine imported pages such that the resulting PDF output document conforms to the same PDF/X conformance level and output condition as the input document(s), you can query the PDF/X status of the imported PDF as follows:
pdfxlevel = p.pcos_get_string(doc, "pdfx");
This statement will retrieve a string designating the PDF/X conformance level of the imported document if it conforms to an ISO PDF/X level, or none otherwise. The returned string can be used to set the PDF/X conformance level of the output document appropriately, using the pdfx option in PDF_begin_document( ). Copying the PDF/X output intent from an imported document. In addition to querying the PDF/X conformance level you can also copy the output intent from an imported document:
ret = p.process_pdi(doc, -1, "action=copyoutputintent");
This can be used as an alternative to setting the output intent via PDF_load_iccprofile( ), and will copy the imported documents output intent to the generated output document, regardless of whether it is defined by a standard name or an ICC profile. Copying the output intent works for imported PDF/A and PDF/X documents. The output intent of the generated output document must be set exactly once, either by copying an imported documents output intent, or by setting it explicitly using PDF_load_iccprofile( ) with usage=outputintent.
248
PDF/A
249
> Additional rules apply when importing pages from existing PDF/A-conforming documents (see Section 10.5.3, Importing PDF/A Documents with PDI, page 253). If the PDFlib client program obeys to these rules, valid PDF/A output is guaranteed. If PDFlib detects a violation of the PDF/A creation rules it will throw an exception which must be handled by the application. No PDF output will be created in case of an error. Cookbook Code samples can be found in the Cookbook topics pdfa/starter_pdfa1b, pdfa/text_to_pdfa, and pdfa/images_to_pdfa. Required operations for PDF/A-1b. Table 10.7 lists all operations required to generate PDF/A-conforming output. The items apply to both PDF/A conformance levels unless otherwise noted. Not calling one of the required functions while in PDF/A mode will trigger an exception.
Table 10.7 Operations which must be applied for PDF/A-1 level A and B conformance item conformance level output condition (output intent) PDFlib function and option requirements for PDF/A conformance The pdfa option in PDF_begin_document( ) must be set to the required PDF/A conformance level, i.e. one of PDF/A-1a:2005 or PDF/A-1b:2005. PDF_load_iccprofile( ) with usage=outputintent or PDF_process_pdi( ) with action=copyoutputintent (but not both methods) must be called immediately after PDF_begin_document( ) if any of the device-dependent colors spaces Gray, RGB, or CMYK is used in the document. If an output intent is used, an ICC profile must be embedded (unlike PDF/X, unembedded standard output conditions are not sufficient in PDF/A). Use the embedprofile option of PDF_load_ iccprofile( ) to embed a profile for a standard output condition. The embedding option of PDF_load_font( ) (and other functions which accept this option) must be true. Note that embedding is also required for the PDF core fonts. Grayscale images and PDF_setcolor( ) with a gray color space can only be used if the output condition is a grayscale, RGB, or CMYK device, or if the defaultgray option in PDF_begin_page_ext( ) has been set. RGB images and PDF_setcolor( ) with an RGB color space can only be used if the output condition is an RGB device, or the defaultrgb option in PDF_begin_page_ext( ) has been set. CMYK images and PDF_setcolor( ) with a CMYK color space can only be used if the output condition is a CMYK device, or the defaultcmyk option in PDF_begin_page_ext( ) has been set.
Prohibited and restricted operations. Table 10.8 lists all operations which are prohibited when generating PDF/A-conforming output. The items apply to both PDF/A conformance levels unless otherwise noted. Calling one of the prohibited functions while in PDF/A mode will trigger an exception. Similarly, if an imported PDF document does not comform to the current PDF/A output level, the corresponding PDI call will fail.
Table 10.8 Operations which must be avoided or are restricted to achieve PDF/A conformance item annotations Prohibited or restricted PDFlib functions and options for PDF/A conformance PDF_create_annotation( ): annotations with type=FileAttachment must be avoided; the zoom and rotate options must not be set to true. The annotcolor and interiorcolor options must only be used if an RGB output intent has been specified. The fillcolor option must only be used if an RGB or CMYK output intent has been specified, and a corresponding rgb or cmyk color space must be used. PDF_create_field( ) and PDF_create_fieldgroup( ) for creating form fields must be avoided.
form fields
250
Table 10.8 Operations which must be avoided or are restricted to achieve PDF/A conformance item actions and JavaScript images ICC profiles page sizes Prohibited or restricted PDFlib functions and options for PDF/A conformance PDF_create_action( ): actions with type=Hide, Launch, Movie, ResetForm, ImportData, JavaScript must be avoided; for type=name only NextPage, PrevPage, FirstPage, and LastPage are allowed. The OPI-1.3 and OPI-2.0 options and interpolate=true option in PDF_load_image( ) must be avoided. ICC profiles loaded with PDF_load_iccprofile( ) must comply to ICC specification ICC.1:1998-09 and its addendum ICC.1A:1999-04 (internal profile version 2.x). There are no strict page size limits in PDF/A. However, it is recommended to keep the page size (width and height, and all box entries) in the range 3...14400 points (508 cm) to avoid problems with Acrobat. The OPI-1.3 and OPI-2.0 options in PDF_begin_template_ext( ) must be avoided. Soft masks for images must be avoided: the masked option for PDF_load_image( ) must be avoided unless the mask refers to a 1-bit image. Images with implicit transparency (alpha channel) are not allowed; they must be loaded with the ignoremask option of PDF_load_image( ). The opacityfill and opacitystroke options for PDF_create_gstate( ) must be avoided unless they have a value of 1; if blendmode is used it must be Normal. The opacity option in PDF_create_annotation( ) must be avoided. transparency groups The transparencygroup option of PDF_begin/end_page_ext( ), PDF_begin_template_ext( ), and PDF_open_pdi_page( ) is not allowed. security PDF version / compatibility PDF import (PDI) metadata The userpassword, masterpassword, and permissions options in PDF_begin_document( ) must be avoided. PDF/A is based on PDF 1.4. Operations that require PDF 1.5 or above (such as layers) must be avoided. Imported documents must conform to a PDF/A level which is compatible to the output document, and must have been prepared according to a compatible output intent (see Table 10.11). All predefined XMP schemas (see PDFlib Reference) can be used. In order to use other schemas (extension schemas) the corresponding description must be embedded using the PDF/A extension schema container schema. The reference option in PDF_begin_template_ext( ) and PDF_open_pdi_page( ) must be avoided.
templates transparency
External content
Additional requirements and restrictions for PDF/A-1a. When creating PDF/A-1a, all requirements for creating Tagged PDF output as discussed in Section 10.6, Tagged PDF, page 258, must be met. In addition, some operations are not allowed or restricted as detailed in Table 10.9. The user is responsible for creating suitable structure information; PDFlib does neither check nor enforce any semantic restrictions. A document which contains all of its text in a single structure element is technically correct PDF/A-1a, but violates the goal of faithful semantic reproduction, and therefore the spirit of PDF/A-1a.
251
Table 10.9 Additional requirements for PDF/A-1a conformance item Tagged PDF PDFlib function and option equirements for PDF/A-1a conformance All requirements for Tagged PDF must be met (see Section 10.6, Tagged PDF, page 258). The following are strongly recommended: > The Lang option should be supplied in PDF_begin/end_document( ) to specify the default document language.
> The Lang option should be specified properly in PDF_begin_item( ) for all content items which
differ from the default document language.
> Non-textual content items, e.g. images, should supply an alternate text description using the
Alt option of PDF_begin_item( ).
> Non-Unicode text, e.g. logos and symbols should have appropriate replacement text specified
in the ActualText option of PDF_begin_item( ) for the enclosing content item.
> Abbreviations and acronyms should have appropriate expansion text specified in the E option
of PDF_begin_item( ) for the enclosing content item. annotations PDF_create_annotation( ): a non-empty string must be supplied for the contents option
Table 10.10 Additional operations must be avoided or are restricted for PDF/A-1a conformance item fonts PDF import (PDI) Prohibited or restricted PDFlib functions and options or PDF/A-1a conformance The monospace option, unicodemap=false, and autocidfont=false in PDF_load_font( ) (and other functions which accept these options) must be avoided. Imported documents must conform to a PDF/A level which is compatible to the output document (see Table 10.11), and must have been prepared according to the same output intent.
Output intents. The output condition defines the intended target device, which is important for consistent color rendering. Unlike PDF/X, which strictly requires an output intent, PDF/A allows the specification of an output intent, but does not require it. An output intent is only required if device-dependent colors are used in the document. The output intent can be specified with an ICC profile. Output intents can be specified as follows:
icc = p.load_iccprofile("sRGB", "usage=outputintent");
As an alternative to loading an ICC profile, the output intent can also be copied from an imported PDF/A document using PDF_process_pdi( ) with the option action=copyoutputintent. Creating PDF/A and PDF/X at the same time. A PDF/A-1 document can at the same time conform to PDF/X-1a:2003, PDF/X-3:2003, or PDF/X-4 (but not to PDF/X-4p or PDF/X-5). In order to create such a combo file supply appropriate values for the pdfa and pdfx options of PDF_begin_document( ), e.g.:
ret = p.begin_document("combo.pdf", "pdfx=PDF/X-4 pdfa=PDF/A-1b:2005");
The output intent must be the same for PDF/A and PDF/X, and must be specified as an output device ICC profile. PDF/X standard output conditions can only be used in combination with the embedprofile option.
252
Cookbook A full code sample can be found in the Cookbook topic pdfa/import_pdfa. If one or more PDF/A documents are imported, they must all have been prepared for a compatible output condition according to Table 10.12. The output intents in all imported documents must be identical or compatible; it is the users responsibility to make sure that this condition is met.
Table 10.12 Output intent compatibility when importing PDF/A documents output intent of imported document output intent of generated document none Grayscale ICC profile RGB ICC profile CMYK ICC profile none yes yes yes yes Grayscale yes
1
RGB yes
1
CMYK yes1
1. Output intent of the imported document and output intent of the generated document must be identical
While PDFlib can correct certain items, it is not intended to work as a full PDF/A validator or to enforce full PDF/A conformance for imported documents. For example, PDFlib will not embed fonts which are missing from imported PDF pages. If you want to combine imported pages such that the resulting PDF output document conforms to the same PDF/A conformance level and output condition as the input document(s), you can query the PDF/A status of the imported PDF as follows:
pdfalevel = p.pcos_get_string(doc, "pdfa");
253
This statement will retrieve a string designating the PDF/A conformance level of the imported document if it conforms to a PDF/A level, or none otherwise. The returned string can be used to set the PDF/A conformance level of the output document appropriately, using the pdfa option in PDF_begin_document( ). Copying the PDF/A output intent from an imported document. In addition to querying the PDF/A conformance level you can also copy the PDF/A output intent from an imported document. Since PDF/A documents do not necessarily contain any output intent (unlike PDF/X which requires an output intent) you must first use pCOS to check for the existence of an output intent before attempting to copy it. Cookbook A full code sample can be found in the Cookbook topic pdfa/import_pdfa. This can be used as an alternative to setting the output intent via PDF_load_iccprofile( ), and will copy the imported documents output intent to the generated output document. Copying the output intent works for imported PDF/A and PDF/X documents. The output intent of the generated output document must be set exactly once, either by copying an imported documents output intent, or by setting it explicitly using PDF_load_iccprofile( ) with the usage option set to outputintent. The output intent should be set immediately after PDF_begin_document( ).
RGB2 yes
CMYK2 yes
1. LZW-compressed TIFF images with CIELab color will be converted to RGB. 2. Device color space without any ICC profile
In order to create black text output without the need for any output intent profile the CIELab color space can be used. The Lab color value (0, 0, 0) specifies pure black in a device-independent manner, and is PDF/A-conforming without any output intent profile (unlike DeviceGray, which requires an output intent profile). PDFlib will automatically
254
initialize the current color to black at the beginning of each page. Depending on whether or not an ICC output intent has been specified, it will use the DeviceGray or Lab color space for selecting black. Use the following call to manually set Lab black color:
p.setcolor("fillstroke", "lab", 0, 0, 0, 0);
In addition to the color spaces listed in Table 10.13, spot colors can be used subject to the corresponding alternate color space. Since PDFlib uses CIELab as the alternate color space for the builtin HKS and PANTONE spot colors, these can always be used with PDF/ A. For custom spot colors the alternate color space must be chosen so that it is compatible with the PDF/A output intent. Note More information on PDF/A and color space can be found in Technical Note 0002 of the PDF/A Competence Center at www.pdfa.org.
1. See www.aiim.org/documents/standards/xmpspecification.pdf
255
Table 10.14 Predefined XMP schemas for PDF/A-1 Schema name and description (see XMP 2004 for details) XMP Basic schema XMP Media Management schema XMP Paged-Text schema XMP Rights Management schema namespace URI http://ns.adobe.com/xap/1.0/ http://ns.adobe.com/xap/1.0/mm/ http://ns.adobe.com/xap/1.0/t/pg/ http://ns.adobe.com/xap/1.0/rights/ preferred namespace prefix xmp xmpMM xmpTPg xmpRights
XMP extension schemas. If your metadata requirements are not covered by the predefined schemas you can define an XMP extension schema. PDF/A-1 describes an extension mechanism which must be used when custom schemas are to be embedded in a PDF/A document. Table 10.15 summarizes the schemas which must be used for describing one or more extension schemas, along with their namespace URI and the required namespace prefix. Note that the namespace prefixes are required (unlike the preferred namespace prefixes for predefined schemas). The details of constructing an XMP extension schema for PDF/A-1 are beyond the scope of this manual. Detailed instructions are available from the PDF/A Competence Center. XMP document metadata packages can be supplied to the metadata options of PDF_ begin_document( ), PDF_end_document( ), or both. Cookbook Full code and XMP samples can be found in the Cookbook topics pdfa/pdfa_extension_schema and pdfa/pdfa_extension_schema_with_type.
Table 10.15 PDF/A-1 extension schema container schema and auxiliary schemas Schema name and description namespace URI1 required namespace prefix pdfaExtension
PDF/A extension schema container schema: http://www.aiim.org/pdfa/ns/extension/ container for all embedded extension schema descriptions PDF/A schema value type: describes a single http://www.aiim.org/pdfa/ns/schema# extension schema with an arbitrary number of properties PDF/A property value type: describes a single property PDF/A ValueType value type: describes a custom value type used in extension schema properties; only required if types beyond the XMP 2004 list of types are used. http://www.aiim.org/pdfa/ns/property# http://www.aiim.org/pdfa/ns/type#
pdfaSchema
pdfaProperty pdfaType
pdfaField
1. Note that the namespace URIs are incorrectly listed in ISO 19005-1, and have been corrected in Technical Corrigendum 1.
256
257
structure information
Unicode-compatible text output. When generating Tagged PDF, all text output must use fonts which are Unicode-compatible as detailed in Section 4.3.4, Unicode-compatible Fonts, page 90. This means that all used fonts must provide a mapping to Unicode. Non Unicode-compatible fonts are only allowed if alternate text is provided for the content via the ActualText or Alt options in PDF_begin_item( ). PDFlib will throw an exception if text without proper Unicode mapping is used while generating Tagged PDF. Note In some cases PDFlib will not be able to detect problems with wrongly encoded fonts, for example symbol fonts encoded as text fonts. Also, due to historical problems PostScript fonts with certain typographical variations (e.g., expert fonts) are likely to result in inaccessible output. Page content ordering. The ordering of text, graphics, and image operators which define the contents of the page is referred to as the content stream ordering; the content ordering defined by the logical structure tree is referred to as logical ordering. Tagged PDF generation requires that the client obeys certain rules regarding content ordering. The natural and recommended method is to sequentially generate all constituent parts of a structure element, and then move on to the next element. In technical terms, the structure tree should be created during a single depth-first traversal.
258
A different method which should be avoided is to output parts of the first element, switch to parts of the next element, return to the first, etc. In this method the structure tree is created in multiple traversals, where each traversal generates only parts of an element. Importing Pages with PDI. Pages from Tagged PDF documents or other PDF documents containing structure information cannot be imported in Tagged PDF mode since the imported document structure would interfere with the generated structure. Pages from unstructured documents can be imported, however. Note that they will be treated as is by Acrobats accessibility features unless they are tagged with appropriate ActualText. Artifacts. Graphic or text objects which are not part of the authors original content are called artifacts. Artifacts should be identified as such using the Artifact pseudo tag, and classified according to one of the following categories: > Pagination: features such as running heads and page numbers > Layout: typographic or design elements such as rules and table shadings > Page: production aids, such as trim marks and color bars. Although artifact identification is not strictly required, it is strongly recommended to aid text reflow and accessibility. Inline items. PDF defines block-level structure elements (BLSE) and inline-level structure elements (ILSE) (see the PDFlib Reference for a precise definition). BLSEs may contain other BLSEs or actual content, while ILSEs always directly contain content. In addition, PDFlib makes the following distinction:
Table 10.17 Regular and inline items regular items affected items regular/inline status can be changed part of the documents structure tree can cross page boundaries can be interrupted by other items can be suspended and activated can be nested to an arbitrary depth all grouping elements and BLSEs no yes yes yes yes yes inline items all ILSEs and non-structural tags (pseudo tags) only for ASpan items no no no no only with other inline items
The regular vs. inline decision for ASpan items is under client control via the inline option of PDF_begin_item( ). Forcing an accessibility span to be regular (inline=false) is recommended, for example, when a paragraph which is split across several pages contains multiple languages. Alternatively, the item could be closed, and a new item started on the next page. Inline items must be closed on the page where they have been opened. Recommended operations. Table 10.18 lists all operations which are optional, but recommended when generating Tagged PDF output. These features are not strictly re-
259
quired, but will enhance the quality of the generated Tagged PDF output and are therefore recommended.
Table 10.18 Operations which are recommended for generating Tagged PDF item hyphenation word boundaries Recommended PDFlib functions and options for Tagged PDF compatibility Word breaks (separating words in two parts at the end of a line) should be presented using a soft hyphen character (U+00A0) as opposed to a hard hyphen (U+002D) Words should be separated by space characters (U+0020) even if this would not strictly be required for positioning. The autospace parameter can be used for automatically generating space characters after each call to one of the show functions. In order to distinguish real content from page artifacts, artifacts should be identified as such using PDF_begin_item( ) with tag=Artifact. The familyname, stretch, and weight options of PDF_begin_font( ) should be supplied with reasonable values for all Type 3 fonts used in a Tagged PDF document.
interactive elements Interactive elements, e.g. links, should be included in the document structure and made accessible if required, e.g. by supplying alternate text. The tab order for interactive elements can be specified with the taborder option of PDF_begin/end_document( ) (this is not necessary if the interactive elements are properly included in the document structure).
Prohibited operations. Table 10.19 lists all operations which are prohibited when generating Tagged PDF output. Calling one of the prohibited functions while in Tagged PDF mode will trigger an exception.
Table 10.19 Operations which must be avoided when generating Tagged PDF item non-Unicode compatible fonts PDF import PDFlib operations to be avoided for Tagged PDF compatibility Fonts which are not Unicode-compatible according to Section 4.3.4, Unicode-compatible Fonts, page 90, must be avoided. Pages from PDF documents which contain structure information (in particular: Tagged PDF documents) must not be imported.
10.6.2 Creating Tagged PDF with direct Text Output and Textflows
Minimal Tagged PDF sample. The following sample code creates a very simplistic Tagged PDF document. Its structure tree contains only a single P element. The code uses the autospace feature to automatically generate space characters between fragments of text:
if (p.begin_document("hello-tagged.pdf", "tagged=true") == -1) throw new Exception("Error: " + p.get_errmsg()); /* automatically create spaces between chunks of text */ p.set_parameter("autospace", "true"); /* open the first structure element as a child of the document structure root (=0) */ id = p.begin_item("P", "Title={Simple Paragraph}"); p.begin_page_ext(0, 0, "width=a4.width height=a4.height"); font = p.load_font("Helvetica-Bold", "unicode", "");
260
p.setfont(font, 24); p.show_xy("Hello, Tagged PDF!", 50, 700); p.continue_text("This PDF has a very simple"); p.continue_text("document structure."); p.end_page_ext(""); p.end_item(id); p.end_document("");
Generating Tagged PDF with Textflow. The Textflow feature (see Section 8.2, MultiLine Textflows, page 179) offers powerful features for text formatting. Since individual text fragments are no longer under client control, but will be formatted automatically by PDFlib, special care must be taken when generating Tagged PDF with textflows: > Textflows can not contain individual structure elements, but the complete contents of a single Textflow fitbox can be contained in a structure element. > All parts of a Textflow (all calls to PDF_fit_textflow( ) with a specific Textflow handle) should be contained in a single structure element. > Since the parts of a Textflow could be spread over several pages which could contain other structure items, attention should be paid to choosing the proper parent item (rather than using a parent parameter of -1, which may point to the wrong parent element). > If you use the matchbox feature for creating links or other annotations in a Textflow it is difficult to maintain control over the annotations position in the structure tree.
261
/* 1 create top part of left column */ p.set_text_pos(x1_left, y1_left_top); ... /* 2 create bottom part of left column */ p.set_text_pos(x1_left, y1_left_bottom); ... /* 3 create top part of right column */ p.set_text_pos(x1_right, y1_right_top); ... p.end_item(id_sect1); id_sect2 = p.begin_item("Sect", "Title={Second Section}"); /* 4 create bottom part of right column */ p.set_text_pos(x2_right, y2_right); ... /* second section may be continued on next page(s) */ p.end_item(id_sect2); String optlist = "Title=Table parent=" + id_art; id_table = p.begin_item("Table", optlist); /* 5 create table structure and content */ p.set_text_pos(x_start_table, y_start_table); ... p.end_item(id_table); optlist = "Title=Insert parent=" + id_art; id_insert = p.begin_item("P", optlist); /* 6 create insert structure and content */ p.set_text_pos(x_start_table, y_start_table); ... p.end_item(id_insert); id_artifact = p.begin_item("Artifact", ""); /* 7+8 create header and footer */ p.set_text_pos(x_header, y_header); ... p.set_text_pos(x_footer, y_footer); ... p.end_item(id_artifact); /* article may be continued on next page(s) */ ... p.end_item(id_art);
262
7 1
Fig. 10.1 Creating a complex page layout in logical structure order (left) and in visual order (right). The right variant uses item activation for the first section before continuing fragments 4 and 6.
1 6 3 4 2 3 4 8 6 7 5
5 2 8
Generating page contents in visual order. The logical order approach forces the creator to construct the page contents in logical order even if it might be easier to create it in visual order: header, left column upper part, table, left column lower part, insert, right column, footer. Using PDF_activate_item( ) this ordering can be implemented as follows:
/* create page layout in visual order */ id_header = p.begin_item("Artifact", ""); /* 1 create header */ p.set_text_pos(x_header, y_header); ... p.end_item(id_header); id_art = p.begin_item("Art", "Title=Article"); id_sect1 = p.begin_item("Sect", "Title = {First Section}"); /* 2 create top part of left column */ p.set_text_pos(x1_left, y1_left_top); ... String optlist = "Title=Table parent=" + id_art; id_table = p.begin_item("Table", optlist); /* 3 create table structure and content */ p.set_text_pos(x_start_table, y_start_table); ... p.end_item(id_table); /* continue with first section */ p.activate_item(id_sect1); /* 4 create bottom part of left column */ p.set_text_pos(x1_left, y1_left_bottom); ...
263
optlist = "Title=Insert parent=" + id_art; id_insert = p.begin_item("P", optlist); /* 5 create insert structure and content */ p.set_text_pos(x_start_table, y_start_table); ... p.end_item(id_insert); /* still more contents for the first section */ p.activate_item(id_sect1); /* 6 create top part of right column */ p.set_text_pos(x1_right, y1_right_top); ... p.end_item(id_sect1); id_sect2 = p.begin_item("Sect", "Title={Second Section}"); /* 7 create bottom part of right column */ p.set_text_pos(x2_right, y2_right); ... /* second section may be continued on next page(s) */ p.end_item(id_sect2); id_footer = p.begin_item("Artifact", ""); /* 8 create footer */ p.set_text_pos(x_footer, y_footer); ... p.end_item(id_footer); /* article may be continued on next page(s) */ ... p.end_item(id_art);
With this ordering of structure elements the main text (which spans one and a half columns) is interrupted twice for the table and the insert. Therefore it must also be activated twice using PDF_activate_item( ). The same technique can be applied if the content spans multiple pages. For example, the header or other inserts could be created first, and then the main page content element is activated again.
264
>
>
>
mended to put the direct elements before the first child elements. Structure items with mixed types of children (i.e., both page content sequences and non-inline structure elements) should be avoided since otherwise Reflow may fail. The BBox option should be provided for tables and illustrations. The BBox should be exact; however, for tables only the lower left corner has to be set exactly. As an alternative to supplying a BBox entry, graphics could also be created within a BLSE tag, such as P, H, etc. However, vector graphics will not be displayed when Reflow is active. If the client does not provide the BBox option (and relies on automatic BBox generation instead) all table graphics, such as cell borders, should be drawn outside the table element. Table elements should only contain table-related elements (TR, TD, TH, THead, TBody, etc.) as child elements, but not any others. For example, using a Caption element within a table could result in reflow problems, although it would be correct Tagged PDF. Content covered by the Private tag will not be exported to other formats. However, they are subject to Reflow and Read Aloud, and illustrations within the Private tag must therefore have alternate text. Reflow seems to have problems with PDF documents generated with the topdown option. Structure items with mixed types of children (i.e., both page content sequences and non-inline structure elements) should be avoided since otherwise Reflow may fail. If an activated item contains only content, but no structure children, Reflow may fail, especially if the item is activated on another page. This problem can be avoided by wrapping the activated item with a non-inline Span tag. Acrobat cannot reflow pages with form fields, and will display a warning in this case. Acrobat 9 cannot reflow pages with digital signature fields, and will display a warning in this case. Every reflow problem disables the reflow feature and disables its menu item.
Acrobats Accessibility Checker. Acrobats accessibility checker can be used to determine the suitability of Tagged PDF documents for consumption with assisting technology such as a screenreader. Some hints: > In order to make form fields accessible, use the tooltip option of PDF_create_field( ) and PDF_create_fieldgroup( ). > If a page contains annotations, Acrobat reports that tab order may be inconsistent with the structure order. Export to other formats with Acrobat. Tagged PDF can significantly improve the result of saving PDF documents in formats such as XML or RTF with Acrobat. > If an imported PDF page has the Form tag, the text provided with the ActualText option will be exported to other formats in Acrobat, while the text provided with the Alt tag will be ignored. However, the Read Aloud feature works for both options. > The content of a NonStruct tag will not be exported to HTML 4.01 CSS 1.0 (but it will be used for HTML 3.2 export). > Alternate text should be supplied for ILSEs (such as Code, Quote, or Reference). If the Alt option is used, Read Aloud will read the provided text, but the real content will be exported to other formats. If the ActualText option is used, the provided text will be used both for reading and exporting.
265
Acrobats Read Aloud Feature. Tagged PDF will enhance Acrobats capability to read text aloud. > When supplying Alt or ActualText it is useful to include a space character at the beginning. This allows the Read Aloud feature to distinguish the text from the preceding sentence. For the same reason, including a . character at the end may also be useful. Otherwise Read Aloud will try to read the last word of the preceding sentence in combination with the first word of the alternate text.
266
Installing the PDFlib Block plugins for Acrobat 8/9 on the Mac. With Acrobat 8/9 on the Mac the plugin folder is not directly visible in the Finder. Instead of dragging the plugin files to the plugin folder use the following steps (make sure that Acrobat is not running): > Extract the plugin files to a folder by double-clicking the disk image. > Locate the Acrobat application icon in the Finder. It is usually located in a folder which has a name similar to the following:
267
> Single-click on the Acrobat application icon and select File, Get Info. > In the window that pops up click the triangle next to Plug-ins. > Click Add... and select the PDFlib Block Plugin Acro X folder (where X designates your Acrobat version) from the folder which has been created in the first step. Note that after installation this folder will not immediately show up in the list of plugins, but only when you open the info window next time. Troubleshooting. If the PDFlib Block plugin doesnt seem to work check the following: > Make sure that in Edit, Preferences, [General...], General the box Use only certified plugins is unchecked. The plugins will not be loaded if Acrobat is running in Certified Mode. > Some PDF forms created with Adobe Designer may prevent the Block plugin as well as other Acrobat plugins from working properly since they interfere with PDFs internal security model. For this reason we suggest to avoid Designers static PDF forms, and only use dynamic PDF forms as input for the Block plugin.
268
269
type of Block the PDFlib API offers a dedicated function for processing the Block, e.g. PDF_fill_textblock( ). These functions search an imported PDF page for a Block by its name, analyze its properties, and place some client-supplied data (text, raster image, or PDF page) on the new page according to the corresponding Block properties. Properties for default contents. Special Block properties can be defined which hold the default contents of a Block, i.e. the text, image or PDF contents which will be placed in the Block if no variable data has been supplied to the Block filling functions, or in situations where the Block contents are currently constant, but may change in the next print run. Default properties play an important role for the Preview feature of the Block plugin (see Section 11.4, Previewing PDFlib Blocks in Acrobat, page 281). Custom Block properties. Standard Block properties make it possible to quickly implement variable data processing applications, but these are restricted to the set of properties which are internally known to PDFlib and can automatically be processed. In order to provide more flexibility, the designer can also assign custom properties to a Block. These can be used to extend the Block concept in order to match the requirements of the most demanding variable data processing applications. There are no rules for custom properties since PDFlib will not process custom properties in any way, except making them available to the client. The client code can examine the custom properties and act in whatever way it deems appropriate. Based on some custom property of a Block the code may make layout-related or data-gathering decisions. For example, a custom property for a scientific application could specify the number of digits for numerical output, or a database field name may be defined as a custom Block property for retrieving the data corresponding to this Block.
270
Table 11.1 Comparison of PDF form fields and PDFlib Blocks feature design objective typographic features (beyond choice of font and font size) OpenType layout features complex script support font control text formatting controls PDF form fields for interactive use limited font embedding left-, center-, right-aligned PDFlib Blocks for automated filling kerning, word and character spacing, underline/ overline/strikeout dozens of OpenType layout features, e.g. ligatures, swash characters, oldstyle figures shaping and bidirectional formatting, e.g. for Arabic and Devanagari font embedding and subsetting, encoding left-, center-, right-aligned, justified; various formatting algorithms and controls; inline options can be used to control the appearance of text yes yes no yes (custom Block properties) BMP, CCITT, GIF, PNG, JPEG, JBIG2, JPEG 2000, TIFF grayscale, RGB, CMYK, Lab, spot color (HKS and Pantone spot colors integrated in the Block plugin) yes (both template with Blocks and merged results)
change font or other text attributes within text merged result is integral part of PDF page description users can edit merged field contents yes extensible set of properties use image files for filling color support PDF/X- and PDF/A-conforming RGB PDF/X: no PDF/A: restricted graphics and text properties can be overridden upon filling transparent contents Text Blocks can be linked
271
272
from the PDFlib Blocks menu or the context menu. The position of one or more Blocks can also be changed in small increments by using the arrow keys. Alternatively, you can enter numerical Block coordinates in the properties dialog. The origin of the coordinate system is in the upper left corner of the page. The coordinates will be displayed in the unit which is currently selected in Acrobat: > To change the display units in Acrobat 7/8/9 proceed as follows: go to Edit, Preferences, [General...], Units & Guides and choose one of Points, Inches, Millimeters, Picas, Centimeters. In Acrobat 7/8 you can alternatively go to View, Navigation Tabs, Info and select a unit from the Options menu. > To display cursor coordinates use View, Cursor Coordinates in Acrobat 9 or View, Navigation Tabs, Info in Acrobat 7/8. Note that the chosen unit will only affect the Rect property, but not any other numerical properties. Using a grid to position Blocks. You can take advantage of Acrobats grid feature for precisely positioning and resizing Blocks. Proceed as follows in Acrobat 7/8/9: > Display the grid: View, Grid; > Enable grid snapping: View, Snap to Grid; > Change the grid (see Figure 11.2): go to Edit, Preferences, [General...], Units & Guides. Here you can change the spacing and position of the grid as well as the color of the grid lines. If Snap to Grid is enabled the size and position of Blocks will be aligned with the configured grid. Snap to Grid affects newly generated Blocks as well as existing Blocks which are moved or resized with the Block tool.
273
Creating Blocks by selecting an image or graphic. As an alternative to manually dragging Block rectangles you can use existing page contents to define the Block size. First, make sure that the menu item PDFlib Blocks, Click Object to define Block is enabled. Now you can use the Block tool to click on an image on the page in order to create a Block with the size of the image. You can also click on other graphical objects, and the Block tool will try to select the surrounding graphic (e.g., a logo). The Click Object feature is intended as an aid for defining Blocks. If you want to reposition or resize the Block you can do so afterwards without any restriction. The Block will not be locked to the image or graphics object which was used as a positioning aid. The Click Object feature will try to recognize which vector graphics and images form a logical element on the page. When some page content is clicked, its bounding box (the surrounding rectangle) will be selected unless the object is white or very large. In the next step other objects which are partially contained in the detected rectangle will be added to the selected area, and so on. The final area will be used as the basis for the generated Block rectangle. The end result is that the Click Object feature will try to select complete graphics, and not only individual lines. Automatically detect font properties. The PDFlib Block plugin can analyze the underlying font which is present at the location where a Textline or Textflow Block is positioned, and can automatically fill in the corresponding properties of the Block:
fontname, fontsize, fillcolor, charspacing, horizscaling, wordspacing, textrendering, textrise
274
Since automatic detection of font properties can result in undesired behavior when the background shall be ignored, it can be activated or deactivated using PDFlib Blocks, Detect underlying font and color. By default this feature is turned off. Locking Blocks. Blocks can be locked to protect them against accidentally moving, resizing, or deleting. With the Block tool active, select the Block and choose Lock from its context menu. While a Block is locked you cannot move, resize, or delete it, nor display its properties dialog.
275
> Click on the first Block to select it. The first selected Block will be the master Block. Shift-click other blocks to add them to the set of selected blocks. Alternatively, click Edit, Select All to select all Blocks on the current page. > Double-click within any of the Blocks to open the Block Properties dialog. The Block where you double-click will be the new master Block. > Alternatively, you can click on a single Block to designate it as master Block, and then press the Enter key to open the Block Properties dialog. The Properties dialog displays only the subset of properties which apply to all selected Blocks. The dialog will be populated with property values taken from the master Block. Closing the dialog with Apply will copy its current contents to all selected Blocks, i.e. the values of the master Block with possible manual changes applied in the dialog. This behavior can be used to copy Block properties from a particular Block to one or more other Blocks. The following standard properties can not be shared, i.e. they can not be edited for multiple Blocks at once:
Name, Description, Subtype, Type, Rect, Status
276
Duplicating Blocks on other pages. You can create duplicates of one or more Blocks on an arbitrary number of pages in the current document simultaneously: > Activate the Block tool and select the Blocks you want to duplicate. > Choose Import and Export, Duplicate... from the PDFlib Blocks menu or the context menu. > Choose which Blocks to duplicate (selected Blocks or all on the page) and the range of target pages where you want duplicates of the Blocks. Exporting and importing Blocks. Using the export/import feature for Blocks it is possible to share the Block definitions on a single page or all Blocks in a document among multiple PDF files. This is useful for updating the page contents while maintaining existing Block definitions. To export Block definitions to a separate file proceed as follows: > Activate the Block tool and Select the Blocks you want to export. > Choose Import and Export, Export... from the PDFlib Blocks menu or the context menu. Enter the page range and a file name for the file containing the Block definitions. You can import Block definitions via PDFlib Blocks, Import and Export, Import... . Upon importing Blocks you can choose whether to apply the imported Blocks to all pages in the document, or only to a page range. If more than one page is selected the Block definitions will be copied unmodified to the pages. If there are more pages in the target range than in the imported Block definition file you can use the Repeate Template checkbox. If it is enabled the sequence of Blocks in the imported file will be repeated in the current document until the end of the document is reached. Copying Blocks to another document upon export. When exporting Blocks you can immediately apply them to the pages in another document, thereby propagating the Blocks from one document to another. In order to do so choose an existing document to export the Blocks to. If you activate the checkbox Delete existing Blocks all Blocks which may be present in the target document will be deleted before copying the new Blocks into the document.
277
Table 11.2 Conversion of PDF form fields to PDFlib Blocks PDF form field attribute... all fields Position Name Tooltip Appearance, Text, Font Appearance, Text, Font Size Rect Name Description fontname fontsize; auto font size will be converted to a fixed font size of 2/3 of the Block height, and the fitmethod will be set to auto. For multi-line fields/Blocks this combination will automatically result in a suitable font size which may be smaller than the initial value of 2/3 of the Block height. strokecolor and fillcolor bordercolor backgroundcolor linewidth: Thin=1, Medium=2, Thick=3 Status: Visible=active Hidden=ignore Visible but doesnt print=ignore Hidden but printable=active ...will be converted to the PDFlib Block property
Appearance, Text, Text Color Appearance, Border, Border Color Appearance, Border, Fill Color Appearance, Border, Line Thickness General, Common Properties, Form Field
General, Common Properties, Orien- orientate: 0=north, 90=west, 180=south, 270=east tation text fields Options, Default Value Options, Alignment defaulttext position: Left={left center} Center={center center} Right={right center} textflow: checked=true (Textflow Block) unchecked=false (Textline Block)
Options, Multi-line
radio buttons and check boxes If Check box/Button is checked by default is selected: Options, Check Box Style or Options, Button Style defaulttext: Check=4 Circle=l Cross=8 Diamond=u Square=n Star=H (these characters represent the respective symbols in the ZapfDingbats font)
list boxes and combo boxes Options, Selected (default) item buttons Options, Icon and Label, Label defaulttext defaulttext
278
Multiple form fields with the same name. Multiple form fields on the same page are allowed to have the same name, while Block names must be unique on a page. When converting form fields to Blocks a numerical suffix will therefore be added to the name of generated Blocks in order to create unique Block names (see also Associating form fields with corresponding Blocks, page 279). Note that due to a problem in Acrobat the field attributes of form fields with the same names are not reported correctly. If multiple fields have the same name, but different attributes these differences will not be reflected in the generated Blocks. The Conversion process will issue a warning in this case and provide the names of affected form fields. In this case you should carefully check the properties in the generated Blocks. Associating form fields with corresponding Blocks. Since the form field names will be modified when converting multiple fields with the same name (e.g. radio buttons) it is difficult to reliably identify the Block which corresponds to a particular form field. This is especially important when using an FDF or XFDF file as the source for filling Blocks such that the final result resembles the filled form. In order to solve this problem the AcroFormConversion plugin records details about the original form field as custom properties when creating the corresponding Block. Table 11.3 details the custom properties which can be used to reliably identify the Blocks; all properties have type string.
Table 11.3 Custom properties for identifying the original form field corresponding to the Block custom property PDFlib:field:name PDFlib:field:pagenumber PDFlib:field:type PDFlib:field:value meaning Fully qualified name of the form field Page number (as a string) in the original document where the form field was located Type of the form field; one of pushbutton, checkbox, radiobutton, listbox, combobox, textfield, signature (Only for type=checkbox) Export value of the form field
Binding Blocks to the corresponding form fields. In order to keep PDF form fields and the generated PDFlib Blocks synchronized, the generated Blocks can be bound to the corresponding form fields. This means that the Block tool will internally maintain the relationship of form fields and Blocks. When the conversion process is activated again, bound Blocks will be updated to reflect the attributes of the corresponding PDF form fields. Bound Blocks are useful to avoid duplicate work: when a form is updated for interactive use, the corresponding Blocks can automatically be updated, too. If you do not want to keep the converted form fields after Blocks have been generated you can choose the option Delete converted Form Fields in the PDFlib Blocks, Convert Form Fields, Conversion Options... dialog. This option will permanently remove the form fields after the conversion process. Any actions (e.g., JavaScript) associated with the affected fields will also be removed from the document. Batch conversion. If you have many PDF documents with form fields that you want to convert to PDFlib Blocks you can automatically process an arbitrary number of documents using the batch conversion feature. The batch processing dialog is available via PDFlib Blocks, Convert Form Fields, Batch conversion...:
279
> The input files can be selected individually; alternatively the full contents of a folder can be processed. > The output files can be written to the same folder where the input files are, or to a different folder. The output files can receive a prefix to their name in order to distinguish them from the input files. > When processing a large number of documents it is recommended to specify a log file. After the conversion it will contain a full list of processed files as well as details regarding the result of each conversion along with possible error messages. During the conversion process the converted PDF documents will be visible in Acrobat, but you cannot use any of Acrobats menu functions or tools.
280
281
> Automatically save the Block PDF before creating the Preview; > Add Block info layers and annotations; > Clone PDF/A-1b or PDF/X status; since these standards restrict the use of layers the Block info layers option is mutually exclusive with this option. > The Advanced PPS options dialog can be used to specify additional option lists for PPS functions according to the PPS API. For example, the searchpath option for PDF_set_ option( ) can be used to specify a directory where images for Block filling are located. It is recommended to specify advanced options in cooperation with the programmer who writes the PPS code. The Preview configuration can be saved to a disk file and later be reloaded. Information provided with the Preview. The generated Preview documents contain various pieces of information in addition to the original page contents (the background) and the filled Blocks. This information can be useful for checking and improving Blocks and PPS configuration. The following items will be created for each processed Block (remember that Blocks without default contents will be skipped): > Error markers: Blocks which cannot be filled successfully will be visualized by a crossed-out rectangle so that they can easily be identified. Error markers will always be created if a Block couldnt be processed. > Bookmarks: The processed Blocks will be summarized in bookmarks which are structured according to the page number, the type of the Block, and possible errors. Bookmarks can be displayed via View, Navigation Panels, Bookmarks. Bookmarks will always be created. > Annotations: For each processed Block an annotation will be created on the page in addition to the actual Block contents. The annotation rectangle visualizes the original Block boundary (depending on the Block contents and filling mode this may be different from the visible contents). The annotation contains the name of the Block and an error message if the Block couldnt be filled. Annotations are visible by default, but can be disabled in the Preview configuration. > Layers: The page contents will be placed on layers to facilitate analysis and debugging. A separate layer will be created for the page background (i.e. the contents of the original page), each Block type, error blocks which couldnt be filled, and the annotations with Block information. Empty layers will be skipped. The layer list can be displayed via View, Navigation Panels, Layers. By default, all layers on the page will be dis-
282
Fig. 11.4 Left: Container document with Blocks; right: preview PDF with Block info layers and annotations
played. In order to hide the contents of a layer click on the eye symbol to the left of the layer name. Layer creation can be disabled in the Preview configuration. Since the use of layers is restricted in PDF/A-1 and PDF/X, layers are not created if the Clone option is enabled. Cloning the PDF/A or PDF/X status. The Clone PDF/A-1b or PDF/X status configuration is useful when PDF output according to the PDF/A or PDF/X standards must be created. Clone mode can be enabled if the input conforms to one of the following standards:
PDF/A-1b:2005 PDF/X-1a:2001, PDF/X-1a:2003 PDF/X-3:2002, PDF/X-3:2003 PDF/X-4, PDF/X-4p PDF/X-5g, PDF/X-5pg
When Previews are created in clone mode, PPS will duplicate the following aspects of the Block PDF in the generated Preview PDF: > the PDF standard identification; > output intent condition; > page sizes including all page boxes; > XMP document metadata. When cloning standard-conforming PDF documents all Block filling operations must conform to the respective standard. For example, if no output intent is present RGB images without ICC profile can not be used. Similarly, all used fonts must be embedded. The full list of requirements can be found in Section 10.4, PDF/X for Print Production,
283
page 242, and Section 10.5, PDF/A for Archiving, page 249. If a Block filling operation in PDF/A or PDF/X cloning mode would violate the selected standard (e.g. because a default image uses RGB color space, but the document does not contain a suitable output intent) an error message pops up and no Preview will be generated. This way users can catch potential standard violations very early in the workflow.
284
Cookbook A full code sample can be found in the Cookbook topic blocks/starter_block. Overriding Block properties. In certain situations the programmer wants to use only some of the properties provided in a Block definition, but override other properties with custom values. This can be useful in various situations: > The scaling factor for an image or PDF page will be calculated instead of taken from the Block definition. > Change the Block coordinates programmatically, for example when generating an invoice with a variable number of data items. > Individual spot color names could be supplied in order to match customer requirements in a print shop application.
285
Property overrides can be achieved by supplying property names and the corresponding values in the option list of all PDF_fill_*block( ) functions as follows:
p.fill_textblock(page, "firstname", "Serge", "fontsize=12");
This will override the Blocks internal fontsize property with the supplied value 12. Almost all property names can be used as options. Property overrides apply only to the respective function calls; they will not be stored in the Block definition. Controlling the display order of imported page and Blocks. The imported page must have been placed on the output page before using any of the Block filling functions. This means that the original page will usually be placed below the Block contents. However, in some situations it may be desirable to place the original page on top of the filled Blocks. This can be achieved with the blind option of PDF_fit_pdi_page( ):
/* Place the page in blind mode to prepare the Blocks, without the page being visible */ p.fit_pdi_page(page, 0.0, 0.0, "blind"); p.fill_textblock(page, "firstname", "Serge", "encoding=winansi"); /* ... fill more blocks ... */ /* Place the page again, this time visible */ p.fit_pdi_page(page, 0.0, 0.0, "");
Cookbook A full code sample can be found in the Cookbook topic blocks/block_below_contents. Duplicating Blocks. Imported Blocks can also be useful as placeholders without any reference to the underlying contents of the Blocks page. You can import a page with Blocks in blind mode on one or more pages, i.e. with the blind option of PDF_fit_pdi_ page( ), and subsequently fill the Blocks. This way you can take advantage of the Block and its properties, and can even duplicate Blocks on multiple pages (or even on the same output page). Cookbook A full code sample can be found in the Cookbook topic blocks/duplicate_block. Linking Textflow Blocks. Textflow Blocks can be linked so that one Block holds the overflow text from a previous Block. For example, if you have long variable text which may need to be continued on another page you can link two Blocks and fill the text which is still available after filling the first Block into the second Block. PPS internally creates a Textflow from the text provided to PDF_fill_textblock( ) and the Block properties. For unlinked Blocks this Textflow will be placed in the Block and the corresponding Textflow handle will be deleted at the end of the call; overflow text will be lost. With linked Textflow Blocks the overflow text which remains after filling the first Block can be filled into the next Block. The remainder of the Textflow will be used as Block contents instead of creating a new Textflow. Linking Textflow Blocks works as follows: > In the first call to PDF_fill_textblock( ) within a chain of linked Textflow Blocks the value -1 (in PHP: 0) must be supplied for the textflowhandle option. The Textflow handle created internally will be returned by PDF_fill_textblock( ), and must be stored by the user.
286
> In the next call to PDF_fill_textblock( ) the Textflow handle returned in the previous step can be supplied to the textflowhandle option (the text supplied in the text parameter will be ignored in this case, and should be empty). The Block will be filled with the remainder of the Textflow. > This process can be repeated with more Textflow Blocks. > The returned Textflow handle can be supplied to PDF_info_textflow( ) in order to determine the results of Block filling, e.g. the end condition or the end position of the text. Note that the fitmethod property should be set to clip (this is the default anyway if textflowhandle is supplied). The basic code fragment for linking Textflow Blocks looks as follows:
p.fit_pdi_page(page, 0.0, 0.0, ""); tf = -1; for (i = 0; i < blockcount; i++) { String optlist = "encoding=winansi textflowhandle=" + tf; tf = p.fill_textblock(page, blocknames[i], text, optlist); text = null; if (tf == -1) break; /* check result of most recent call to fit_textflow() */ reason = (int) p.info_textflow(tf, "returnreason"); result = p.get_parameter("string", (float) reason); /* end loop if all text was placed */ if (result.equals("_stop")) { p.delete_textflow(tf); break; } }
Cookbook A full code sample can be found in the Cookbook topic blocks/linked_textblocks. Block filling order. The Block functions PDF_fill_*block( ) process properties and Block contents in the following order: > Background: if the backgroundcolor property is present and contains a color space keyword different from None, the Block area will be filled with the specified color. > Border: if the bordercolor property is present and contains a color space keyword different from None, the Block border will be stroked with the specified color and linewidth. > Contents: the supplied Block contents and all other properties except bordercolor and linewidth will be processed. > Textline and Textflow Blocks: if neither text nor default text has been supplied, there wont be any output at all, not even background color or Block border.
287
Locked Name
Subtype textflow
Type
288
background- (Color) If this property is present and contains a color space keyword different from None, a rectangle will color be drawn and filled with the supplied color. This may be useful to cover existing page contents. Default: None bordercolor linewidth Rect (Color) If this property is present and contains a color space keyword different from None, a rectangle will be drawn and stroked with the supplied color. Default: None (Float; must be greater than 0) Stroke width of the line used to draw the Block rectangle; only used if bordercolor is set. Default: 1 (Rectangle; required) The Block coordinates. The origin of the coordinate system is in the lower left corner of the page. However, the Block plugin will display the coordinates in Acrobats notation, i.e., with the origin in the upper left corner of the page. The coordinates will be displayed in the unit which is currently selected in Acrobat, but will always be stored in points in the PDF file. (Keyword) Describes how the Block will be processed (default: active): active ignore static The Block will be fully processed according to its properties. The Block will be ignored. No variable contents will be placed; instead, the Blocks default text, image, or PDF contents will be used if available.
Status
289
opacityfill opacitystroke
290
Table 11.7 Text appearance properties for Textline and Textflow Blocks keyword charspacing decorationabove fillcolor fontname
1
possible values and explanation (Float or percentage) Character spacing. Percentages are based on fontsize. Default: 0 (Boolean) If true, the text decoration enabled with the underline, strikeout, and overline options will be drawn above the text, otherwise below the text. Changing the drawing order affects visibility of the decoration lines. Default: false (Color) Fill color of the text. Default: gray 0 (=black) (String) Name of the font as required by PDF_load_font( ). The PDFlib Block plugin will present a list of system-installed fonts. However, these font names may not be portable between Mac, Windows, and Unix systems. The encoding for the text must be specified as an option for PDF_fill_textblock( ) when filling the Block unless the font option has been supplied.
fontsize2 fontstyle horizscaling italicangle kerning monospace overline strikeout strokecolor strokewidth
(Float) Size of the font in points (Keyword) Font style, must be one of normal, bold, italic, or bolditalic (Float or percentage) Horizontal text scaling. Default: 100% (Float) Italic angle of text in degrees. Default: 0 (Boolean) Kerning behavior. Default: false (Integer: 1...2048) Forces the same width for all characters in the font. Default: absent (metrics from the font will be used) (Boolean) Overline mode. Default: false (Boolean) Strikeout mode. Default: false (Color) Stroke color of the text. Default: gray 0 (=black) (Float, percentage, or keyword; only effective if textrendering is set to outline text) Line width for outline text (in user coordinates or as a percentage of the fontsize). The keyword auto or the value 0 uses a built-in default. Default: auto (Integer) Text rendering mode. Only the value 3 has an effect on Type 3 fonts. Default 0. Supported values: 0 1 2 3
textrendering
P P
fill text stroke text (outline) fill and stroke text invisible text
4 5 6 7
fill text and add it to the clipping path stroke text and add it to the clipping path fill and stroke text and add it to the clipping path add text to the clipping path
(Float pr percentage) Text rise parameter. Percentages are based on fontsize. Default: 0 (Boolean) Underline mode. Default: false (Float, percentage, or keyword) Position of the stroked line for underlined text relative to the baseline. Percentages are based on fontsize. Default: auto (Float, percentage, or keyword) Line width for underlined text. Percentages are based on fontsize. Default: auto
wordspacing (Float or percentage) Word spacing. Percentages are based on fontsize. Default: 0
1. This property is required in Textline and Textflow Blocks; it will automatically be enforced by the PDFlib Block plugin.
291
no<name> The prefix no in front of a feature name (e.g. noliga) disables this feature. Default: _none for horizontal writing mode, vert for vertical writing mode. The readfeatures option in PDF_load_font( ) is required for OpenType feature support. language (Keyword; only relevant if script is supplied) The text will be processed according to the specified language, which is relevant for the features and shaping options. A full list of keywords can be found in the PDFlib Tutorial, e.g. ARA (Arabic), JAN (Japanese), HIN (Hindi). Default: _none (undefined language) (Keyword; required if shaping=true) The text will be processed according to the specified script, which is relevant for the features, shaping, and advancedlinebreaking options. The most common keywords for scripts are the following: _none (undefined script), latn, grek, cyrl, armn, hebr, arab, deva, beng, guru, gujr, orya, taml, thai, laoo, tibt, hang, kana, han. A full list of keywords can be found in the PDFlib Tutorial. Default: _none (Boolean) If true, the text will be formatted (shaped) according to the script and language options. The script option must have a value different from _none and the required shaping tables must be available in the font. Default: false
script
shaping
292
advancedlinebreak alignment
(Boolean) Enable the advanced line breaking algorithm which is required for complex scripts. This is required for linebreaking in scripts which do not use space characters for designating word boundaries, e.g. Thai. The options locale and script will be honored. Default: false (Keyword) Specifies formatting for lines in a paragraph. Default: left. left center right justify left-aligned, starting at leftindent centered between leftindent and rightindent right-aligned, ending at rightindent left- and right-aligned
(Boolean) If true, empty lines at the beginning of a fitbox will be deleted. Default: false (Boolean) If true, the first leading value found in each line will be used. Otherwise the maximum of all leading values in the line will be used. Default: false (Keyword) Treatment of horizontal tabs in the text. If the calculated position is to the left of the current text position, the tab will be ignored (default: relative): relative ruler The position will be advanced by the amount specified in hortabsize. The position will be advanced to the n-th tab value in the ruler option, where n is the number of tabs found in the line so far. If n is larger than the number of tab positions the relative method will be applied. typewriter The position will be advanced to the next multiple of hortabsize.
hortabsize
(Float or percentage) Width of a horizontal tab1. The interpretation depends on the hortabmethod option. Default: 7.5%
293
Table 11.9 Text formatting properties (mostly for Textflow Blocks) keyword lastalignment possible values and explanation (Keyword) Formatting for the last line in a paragraph. All keywords of the alignment option are supported, plus the following (default: auto): auto leading locale Use the value of the alignment option unless it is justify. In the latter case left will be used.
(Float or percentage) Distance between adjacent text baselines in user coordinates, or as a percentage of the font size. Default: 100% (Keyword) The locale which will be used for localized linebreaking methods if advancedlinebreak= true. The keywords consists of one or more components, where the optional components are separated by an underscore character _ (the syntax slightly differs from NLS/POSIX locale IDs):
> A required two- or three-letter lowercase language code according to ISO 639-2 (see www.loc.gov/
standards/iso639-2), e.g. en, (English), de (German), ja (Japanese). This differs from the language option.
> An optional four-letter script code according to ISO 15924 (see www.unicode.org/iso15924/iso15924codes.html), e.g. Hira (Hiragana), Hebr (Hebrew), Arab (Arabic), Thai (Thai).
> An optional two-letter uppercase country code according to ISO 3166 (see www.iso.org/iso/country_
codes/iso_3166_code_lists), e.g. DE (Germany), CH (Switzerland), GB (United Kingdom) Specifying a locale is not required for advanced line breaking: the keyword _none specifies that no localespecific processing will be done. Default: _none Examples: de_DE, en_US, en_GB maxspacing minspacing minlinecount (Float or percentage) The maximum or minimum distance between words (in user coordinates, or as a percentage of the width of the space character). The calculated word spacing is limited by the provided values (but the wordspacing option will still be added). Defaults: minspacing=50%, maxspacing=500% (Integer) Minimum number of lines in the last paragraph of the fitbox. If there are fewer lines they will be placed in the next fitbox. The value 2 can be used to prevent single lines of a paragraph at the end of a fitbox (orphans). Default: 1 (Float or percentage) Lower limit for the length of a line with the nofit method (in user coordinates or as a percentage of the width of the fitbox). Default: 75% (Float or percentage) Left indent of the first line of a paragraph1. The amount will be added to leftindent. Specifying this option within a line will act like a tab. Default: 0 (Float or percentage) Right or left indent of all text lines1. If leftindent is specified within a line and the determined position is to the left of the current text position, this option will be ignored for the current line. Default: 0 (List of floats or percentages) List of absolute tab positions for hortabmethod=ruler1. The list may contain up to 32 non-negative entries in ascending order. Default: integer multiples of hortabsize (Percentage) Lower limit for compressing text with the shrink method; the calculated shrinking factor is limited by the provided value, but will be multiplied with the value of the horizscaling option. Default: 85% (Float or percentage) Upper limit for the distance between two characters for the spread method (in user coordinates or as a percentage of the font size); the calculated character distance will be added to the value of the charspacing option. Default: 0 (Keyword; Textline and Textflow Blocks) This option can be used to create a diagonal stamp within the Block rectangle. The text comprising the stamp will be as large as possible. The options position, fitmethod, and orientate (only north and south) will be honored when placing the stamp text in the box. Default: none. ll2ur ul2lr none The stamp will run diagonally from the lower left corner to the upper right corner. The stamp will run diagonally from the upper left corner to the lower right corner. No stamp will be created.
spreadlimit
stamp
294
Table 11.9 Text formatting properties (mostly for Textflow Blocks) keyword possible values and explanation
tabalignchar (Integer) Unicode value of the character at which decimal tabs will be aligned. Default: the . character (U+002E) tabalignment (List of keywords) Alignment for tab stops. Each entry in the list defines the alignment for the corresponding entry in the ruler option (default: left): center decimal left right Text will be centered at the tab position. The first instance of tabalignchar will be left-aligned at the tab position. If no tabalignchar is found, right alignment will be used instead. Text will be left-aligned at the tab position. Text will be right-aligned at the tab position.
1. In user coordinates, or as a percentage of the width of the fit box 2. Tab settings can be edited in the Ruler Tabs group in the Block properties dialog.
295
dpi
fitmethod
(Float list; only for Textline Blocks) One or two float values describing additional horizontal and vertical reduction of the Block rectangle. Default: 0 (Float or percentage; only for Textlines) Minimum allowed font size when text is scaled down to fit into the Block rectangle with fitmethod=auto when shrinklimit is exceeded. The limit is specified in user coordinates or as a percentage of the height of the Block. If the limit is reached the text will be created with the specified minfontsize as fontsize. Default: 0.1% (Keyword) Specifies the desired orientation of the content when it is placed. Possible values are north, east, south, west. Default: north (Float list) One or two values specifying the position of the reference point within the content. The position is specified as a percentage within the Block. Default: {0 0}, i.e. the lower left corner (Float) Rotation angle in degrees by which the Block will be rotated counter-clockwise before processing begins. The reference point is center of the rotation. Default: 0 (Float list) One or two values specifying the desired scaling factor(s) in horizontal and vertical direction. This option will be ignored if the fitmethod property has been supplied with one of the keywords auto, meet, slice, or entire. Default: 1 (Float or percentage; only for Textlines) The lower limit of the shrinkage factor which will be applied to fit text with fitmethod=auto. Default: 0.75
shrinklimit
296
Table 11.11 Fitting properties for Textflow Blocks keyword firstlinedist possible values and explanation (Float, percentage, or keyword) The distance between the top of the Block rectangle and the baseline for the first line of text, specified in user coordinates, as a percentage of the relevant font size (the first font size in the line if fixedleading=true, and the maximum of all font sizes in the line otherwise), or as a keyword. Default: leading. leading ascender The leading value determined for the first line; typical diacritical characters such as will touch the top of the fitbox. The ascender value determined for the first line; typical characters with larger ascenders, such as d and h will touch the top of the fitbox.
capheight The capheight value determined for the first line; typical capital uppercase characters such as H will touch the top of the fitbox. xheight The xheight value determined for the first line; typical lowercase characters such as x will touch the top of the fitbox.
If fixedleading=false the maximum of all leading, ascender, xheight, or capheight values found in the first line will be used. fitmethod (Keyword) Strategy to use if the supplied content doesnt fit into the box. Possible values are auto, nofit, clip. Default: auto. For Textflow Blocks where the Block is too small for the text the interpretation is as follows: auto nofit clip lastlinedist fontsize and leading will be decreased until the text fits. Text will run beyond the bottom margin of the Block. Text will be clipped at the Block margin.
(Float, percentage, or keyword) Will be ignored for fitmethod=nofit) The minimum distance between the baseline for the last line of text and the bottom of the fitbox, specified in user coordinates, as a percentage of the font size (the first font size in the line if fixedleading= true, and the maximum of all font sizes in the line otherwise), or as a keyword. Default: 0, i.e. the bottom of the fitbox will be used as baseline, and typical descenders will extend below the Block rectangle. descender The descender value determined for the last line; typical characters with descenders, such as g and j will touch the bottom of the fitbox. If fixedleading=false the maximum of all descender values found in the last line will be used.
linespreadlimit maxlines
(Float or percentage; only for verticalalign=justify) Maximum amount in user coordinates or as percentage of the leading for increasing the leading for vertical justification. Default: 200% (Integer or keyword) The maximum number of lines in the fitbox, or the keyword auto which means that as many lines as possible will be placed in the fitbox. When the maximum number of lines has been placed PDF_fit_textflow( ) will return the string _boxfull. (Float or percentage) Minimum font size allowed when text is scaled down to fit into the fitbox, especially for fitmethod=auto. The limit is specified in user coordinates or as a percentage of the height of the fitbox. If the limit is reached and the text still does not fit the string _boxfull will be returned. Default: 0.1% (Keyword) Specifies the desired orientation of the text when it is placed. Possible values are north, east, south, west. Default: north (Float) Rotate the coordinate system, using the lower left corner of the fitbox as center and the specified value as rotation angle in degrees. This results in the box and the text being rotated. The rotation will be reset when the text has been placed. Default: 0
minfontsize
orientate rotate
297
Table 11.11 Fitting properties for Textflow Blocks keyword verticalalign possible values and explanation (Keyword) Vertical alignment of the text in the fitbox. Default: top. top center bottom justify Formatting will start at the first line, and continue downwards. If the text doesnt fill the fitbox there may be whitespace below the text. The text will be vertically centered in the fitbox. If the text doesnt fill the fitbox there may be whitespace both above and below the text. Formatting will start at the last line, and continue upwards. If the text doesnt fill the fitbox there may be whitespace above the text. The text will be aligned with top and bottom of the fitbox. In order to achieve this the leading will be increased up to the limit specified by linespreadlimit. The height of the first line will only be increased if firstlinedist=leading.
298
defaultpdf
defaultpdfpage defaulttext
299
The following statement returns the name of Block number blocknum on page pagenum (Block and page counting start at 0):
blockname = p.pcos_get_string(doc, "pages[" + pagenum + "]/blocks[" + blocknum + "]/Name");
The returned Block name can subsequently be used to query the Blocks properties or populate the Block with text, image, or PDF content. If the specified Block doesnt exist an exception will be thrown. You can avoid this by using the length prefix to determine the number of Blocks and therefore the maximum index in the Blocks array (keep in mind that the Block count will be one higher than the highest possible index since array indexing starts at 0). In the path syntax for addressing Block properties the following expressions are equivalent, assuming that the Block with the sequential <number> has its Name property set to <blockname>:
pages[...]/blocks[<number>] pages[...]/blocks/<blockname>
Finding Block coordinates. The two coordinate pairs (llx, lly) and (urx, ury) describing the lower left and upper right corner of a Block named foo can be queried as follows:
llx lly urx ury = = = = p.pcos_get_number(doc, p.pcos_get_number(doc, p.pcos_get_number(doc, p.pcos_get_number(doc, "pages[" "pages[" "pages[" "pages[" + + + + pagenum pagenum pagenum pagenum + + + + "]/blocks/foo/Rect[0]"); "]/blocks/foo/Rect[1]"); "]/blocks/foo/Rect[2]"); "]/blocks/foo/Rect[3]");
Note that these coordinates are provided in the default user coordinate system (with the origin in the bottom left corner, possibly modified by the pages CropBox), while the Block plugin displays the coordinates according to Acrobats user interface coordinate system with an origin in the upper left corner of the page. Since the Rect option for overriding Block coordinates does not take into account any modifications applied by the CropBox entry, the coordinates queried from the original Block cannot be directly used as new coordinates if a CropBox is present. As a workaround you can use the refpoint and boxsize options. Also note that the topdown parameter is not taken into account when querying Block coordinates.
300
Querying custom properties. Custom properties can be queried as in the following example, where the property zipcode is queried from a Block named b1 on page pagenum:
zip = p.pcos_get_string(doc, "pages[" + pagenum + "]/blocks/b1/Custom/zipcode");
If you dont know which custom properties are actually present in a Block, you can determine the names at runtime. In order to find the name of the first custom property in a Block named b1 use the following:
propname = p.pcos_get_string(doc, "pages[" + pagenum + "]/blocks/b1/Custom[0].key");
Use increasing indexes instead of 0 in order to determine the names of all custom properties. Use the length prefix to determine the number of custom properties. Non-existing Block properties and default values. Use the type prefix to determine whether a Block or property is actually present. If the type for a path is 0 or null the respective object is not present in the PDF document. Note that for standard properties this means that the default value of the property will be used. Name space for custom properties. In order to avoid confusion when PDF documents from different sources are exchanged, it is recommended to use an Internet domain name as a company-specific prefix in all custom property names, followed by a colon : and the actual property name. For example, ACME corporation would use the following property names:
acme.com:digits acme.com:refnumber
Since standard and custom properties are stored differently in the Block, standard PDFlib property names (as defined in Section 11.6, Block Properties, page 288) will never conflict with custom property names.
301
LastModified (Data string; required) The date and time when the Blocks on the page were created or most recently modified. Private (Dictionary; required) A Block list (see Table 11.15)
A Block list is a dictionary containing general information about Block processing, plus a list of all Blocks on the page. Table 11.15 lists the keys in a Block list dictionary.
Table 11.15 Entries in a Block list dictionary key Version Blocks value (Number; required) The version number of the Block specification to which the file complies. This document describes version 8 of the Block specification. (Dictionary; required) Each key is a name object containing the name of a Block; the corresponding value is the Block dictionary for this Block (see Table 11.17). The /Name key in the Block dictionary must be identical to the Blocks name in this dictionary. (String; required unless the pdfmark key is present1) A string containing a version identification of the PDFlib Block plugin which has been used to create the Blocks. (Boolean; required unless the PluginVersion key is present1) Must be true if the Block list has been generated by use of pdfmarks.
PluginVersion pdfmark
302
Data types for Block properties. Properties support the same data types as option lists except handles and specialized lists such as action lists. Table 11.16 details how these types are mapped to PDF data types.
Table 11.16 Data types for Block properties Data type boolean string keyword (name) PDF type and remarks (Boolean) (String) (Name) It is an error to provide keywords outside the list of keywords supported by a particular property.
float, integer (Number) While option lists support both point and comma as decimal separators, PDF numbers support only point. percentage list color (Array with two elements) The first element in the array is the number, the second element is a string containing a percent character. (Array) (Array with two or three elements) The first element in the array specifies a color space, and the second element specifies a color value as follows. The following entries are supported for the first element in the array: /DeviceGray The second element is a single gray value. /DeviceRGB The second element is an array of three RGB values. /DeviceCMYK The second element is an array of four CMYK values. [/Separation/spotname] The first element is an array containing the keyword /Separation and a spot color name. The second element is a tint value. The optional third element in the array specifies an alternate color for the spot color, which is itself a color array in one of the /DeviceGray, /DeviceRGB, /DeviceCMYK, or /Lab color spaces. If the alternate color is missing, the spot color name must either refer to a color which is known internally to PDFlib, or which has been defined by the application at runtime. [/Lab] The first element is an array containing the keyword /Lab. The second element is an array of three Lab values.
To specify the absence of color the respective property must be omitted. unichar (Text string) Unicode strings in utf16be format, starting with the U+FEFF BOM
303
administrative prop- (Some keys are required) Administrative properties according to Table 11.4 erties rectangle properties appearance properties text preparation properties text formatting properties (Some keys are required) Rectangle properties according to Table 11.5 (Some keys are required) Appearance properties for all Block types according to Table 11.6 and text appearance properties according to Table 11.7 for Textline and Textline Blocks (Optional) Text preparation properties for Textline and Textflow Blocks according to Table 11.8 (Optional) Text formatting properties for Textline and Textflow Blocks according to Table 11.9
object fitting proper- (Optional) Object fitting properties for Textline, Image, and PDF Blocks according to Table 11.10, ties and fitting properties for Textflow Blocks according to Table 11.11 properties for default contents Custom (Optional) Properties for default contents according to Table 11.12 (Dictionary; optional) A dictionary containing key/value pairs for custom properties according to Table 11.13.
Example. The following fragment shows the PDF code for two Blocks, a text Block called job_title and an image Block called logo. The text Block contains a custom property called format:
<< /Contents 12 0 R /Type /Page /Parent 1 0 R /MediaBox [ 0 0 595 842 ] /PieceInfo << /PDFlib 13 0 R >> >> 13 0 obj << /Private << /Blocks << /job_title 14 0 R /logo 15 0 R >> /Version 8 /PluginVersion (4.0) >> /LastModified (D:20090813200730) >> endobj 14 0 obj << /Type /Block /Rect [ 70 740 200 800 ]
304
/Name /job_title /Subtype /Text /fitmethod /auto /fontname (Helvetica) /fontsize 12 /Custom << /format 5 >> >> endobj 15 0 obj << /Type /Block /Rect [ 250 700 400 800 ] /Name /logo /Subtype /Image /fitmethod /auto >>
305
/logo << /Type /Block /Name /logo /Subtype /Image /Rect [ 250 700 400 800 ] /fitmethod /auto >> >> /PUT pdfmark
306
A Revision History
Date September 30, 2009 March 13, 2009 February 13, 2008 August 08, 2007 February 19, 2007 October 03, 2006 February 21, 2006 August 09, 2005 November 17, 2004 Changes
> Updates for PDFlib 8 > Various updates and corrections for PDFlib 7.0.4 > Various updates and corrections for PDFlib 7.0.3 > Various updates and corrections for PDFlib 7.0.2 > Various updates and corrections for PDFlib 7.0.1 > Updates and restructuring for PDFlib 7.0.0 > Various updates and corrections for PDFlib 6.0.3; added Ruby section > Various updates and corrections for PDFlib 6.0.2 > Minor updates and corrections for PDFlib 6.0.1 > introduced new format for language-specific function prototypes in chapter 8 > added hypertext examples in chapter 3 > Major changes for PDFlib 6 > Minor additions and corrections for PDFlib 5.0.3 > Minor additions and corrections for PDFlib 5.0.2; added block specification > Minor updates and corrections for PDFlib 5.0.1 > Major changes and rewrite for PDFlib 5.0.0 > Minor changes for PDFlib 4.0.3 and extensions for the .NET binding > Minor changes for PDFlib 4.0.2 and extensions for the IBM eServer edition > Minor changes for PDFlib 4.0.1 > Documents PDI and other features of PDFlib 4.0.0 > Documents the template and CMYK features in PDFlib 3.5.0 > ColdFusion documentation and additions for PDFlib 3.03; separate COM edition of the manual > Delphi documentation and minor additions for PDFlib 3.02 > Additions and clarifications for PDFlib 3.01 > Changes for PDFlib 3.0 > Minor changes and additions for PDFlib 2.01 > Separate sections for the individual language bindings > Extensions for PDFlib 2.0 > Minor changes for PDFlib 1.0 (not publicly released) > Extensions for PDFlib 0.7 (only for a single customer) > First attempt at describing PDFlib scripting support in PDFlib 0.6 > Slightly expanded the manual to cover PDFlib 0.5 > First public release of PDFlib 0.4 and this manual
June 18, 2004 January 21, 2004 September 15, 2003 May 26, 2003 March 26, 2003 June 14, 2002 January 26, 2002 May 17, 2001 April 1, 2001 February 5, 2001 December 22, 2000 August 8, 2000 July 1, 2000 Feb. 20, 2000 Aug. 2, 1999 June 29, 1999
Feb. 1, 1999 Aug. 10, 1998 July 8, 1998 Feb. 25, 1998 Sept. 22, 1997
A Revision History
307
Index
A
Acrobat plugin for creating Bocks 267 Adobe Font Metrics (AFM) 106 advanced linebreaking 138, 193 AES (Advanced Encryption Standard) 238 AFM (Adobe Font Metrics) 106 arrays 223 ArtBox 62 artificial font styles 132 AS/400 57 ascender 130 asciifile parameter 57 auto: see hypertextformat autocidfont parameter 117 autosubsetting parameter 116 clip 62 clone page boxes 169 CMaps 100, 101 Cobol binding 26 code page: Microsoft Windows 1250-1258 92 COM (Component Object Model) binding 27 commercial license 12 content strings 86 content strings in non-Unicode capable languages 87 coordinate system 59 metric 59 top-down 60 copyoutputintent option 248 core fonts 111 CPI (characters per inch) 131 CropBox 62 current point 63 currentx and currenty parameter 130 custom encoding 93
B
baseline compression 153 Big Five 102 bindings 25 BleedBox 62 Blocks 267 plugin 267 properties 269 BMP 155 builtin encoding 123 Byte Order Mark (BOM) 85, 88 bytes: see hypertextformat byteserving 241
D
default coordinate system 59 defaultgray/rgb/cmyk parameters 72 descender 130 dictionaries 223 document info fields 219 downsampling 151 dpi calculations 151
C
C binding 28 C++ binding 31 capheight 130 categories of resources 52 CCITT 155 CCSID 93 CFF (Compact Font Format) 103 character metrics 130 character references 95, 96 characters and glyphs 84 characters per inch 131 Chinese 101, 102, 140 CIE L*a*b* color space 70 CJK (Chinese, Japanese, Korean) configuration 100 custom fonts 141 standard fonts 100 Windows code pages 102
E
EBCDIC 57 ebcdic encoding 92 ebcdicutf8: see hypertextformat embedding fonts 115 encoding CJK 100 custom 93 fetching from the system 93 encrypted PDF documents 233 encryption 238 encryption status 220 environment variable PDFLIBRESOURCE 55 error handling 49 errorpolicy parameter 162 escape sequences 95 EUDC (end-user defined characters) 107, 143 Euro character 122 examples
Index
309
document info fields 219 encryption status 220 fonts in a document 220 number of pages 219 page size 219 pCOS paths 219 writing mode 220 exceptions 49 explicit transparency 157
host fonts 112 HTML character references 95 hypertext strings 86 in non-Unicode capable languages 87 hypertextformat parameter 88
I
IBM zSeries and iSeries 57 ignoremask 157 image data, re-using 151 image file formats 153 image mask 156, 158 image scaling 151 image:iccprofile parameter 71 inch 59 in-core PDF generation 56 inline images 152 invisible text 291 iSeries 57 ISO 10646 83 ISO 15930 242 ISO 19005 249 ISO 32000-1 235 ISO 8859-2 to -15 92
F
features of PDFlib 21 fill 62 font metrics 130 font style names for Windows 113 font styles 132 fonts AFM files 106 embedding 115 legal aspects of embedding 116 monospaced 131 OpenType 103 PDF core set 111 PFA files 106 PFB files 106 PFM files 106 PostScript 103, 106 resource configuration 52 subsetting 116 TrueType 103 Type 1 106 Type 3 (user-defined) fonts 108 Type 3 108 user-defined (Type 3) 108 fonts in a document 220 FontSpecific encoding 123 form fields: converting to blocks 277 form XObjects 64
J
Japanese 101, 102, 140 Java binding 33 Javadoc 35 JBIG2 154 JFIF 154 Johab 102 JPEG 153 JPEG 2000 154
K
kerning 131 Korean 101, 102, 140
G
gaiji characters 104 GBK 102 GIF 155 glyph availability 126 glyph id addressing 98 glyph name references 97 glyph replacement 121 glyphs 84 gradients 66 grid.pdf 59 Groovy 35
L
language bindings: see bindings layers and PDI 162 leading 130 line spacing 130 linearized PDF 241 LWFN (LaserWriter Font) 106
M
macroman encoding 91, 92 macroman_apple encoding 122 makepsres utility 52 mask 157 masked 157 masking images 156 masterpassword 239
H
HKS colors 69 horizontal writing mode 140 host encoding 91
310
Index
MediaBox 62 memory, generating PDF documents in 56 metric coordinates 59 metrics 130 millimeters 59 monospaced fonts 131 multi-page image files 152
N
name strings 86 in non-Unicode capable languages 87 nesting exceptions 29 .NET binding 36 number of pages 219
Perl binding 37 permissions 238, 240 PFA (Printer Font ASCII) 106 PFB (Printer Font Binary) 106 PFM (Printer Font Metrics) 106 PHP binding 39 plugin for creating Blocks 267 PNG 153, 157 PostScript fonts 103, 106 PPS (PDFlib Personalization Server) 267 Printer Font ASCII (PFA) 106 Printer Font Binary (PFB) 106 Printer Font Metrics (PFM) 106 Python binding 41
O
OpenType fonts 103 optimized PDF 241 outline text 291 output intent 246 for PDF/A 250 for PDF/X 243 overline parameter 133
R
raw image data 156 REALbasic binding 42 rendering intents 70 renderingintent option 70 resource category 52 resourcefile parameter 55 rotating objects 60 RPG binding 43 Ruby binding 46
P
page 152 page descriptions 59 page formats 61 page size 219 limitations in Acrobat 61 page-at-a-time download 241 PANTONE colors 67 passwords 238 good and bad 239 path 62 path objects 63 patterns 66 pCOS 219 data types 221 encryption 233 path syntax 224 pseudo objects 226 pCOS interface 219 PDF import library (PDI) 160 PDF Reference Manual 219 PDF/A 249 PDF/X 242 PDF_EXIT_TRY( ) 29 PDF_get_buffer() 56 PDFlib Blocks 267 PDFlib features 21 PDFlib Personalization Server (PPS) 267 pdflib.upr 55 PDFLIBRESOURCE environment variable 55 PDI 160 pdiusebox 162
S
S/390 57 scaling images 151 script-specific linebreaking 138, 193 SearchPath parameter 53 security 238 setcolor:iccprofilegray/rgb/cmyk parameters 71 shadings 66 Shift-JIS 102 SING fonts 103 smooth blends 66 SPIFF 154 spot color (separation color space) 67 sRGB color space 71 standard output conditions for PDF/A 252 for PDF/X 246 strikeout parameter 133 strings in option lists 89 stroke 62 style names for Windows 113 subpath 62 subscript 131 subsetminsize parameter 116 subsetting 116 superscript 131 Symbol font 123 system encoding support 93
Index
311
T
Tcl binding 47 templates 64 temporary disk space requirements 241 text metrics 130 text position 130 text variations 130 textformat parameter 88 textlen for Symbol fonts in Textflow 189 textrendering parameter 133 textx and texty parameter 130 TIFF 155 top-down coordinates 60 transparency 156 TrimBox 62 TrueType fonts 103 TTC (TrueType Collection) 107, 141, 142 TTF (TrueType font) 103 Type 1 fonts 106 Type 3 (user-defined) fonts 108
user space 59 usercoordinates parameter 59 user-defined (Type 3) fonts 108 userpassword 239 UTF formats 84 utf16: see hypertextformat utf16be: see hypertextformat utf16le: see hypertextformat utf8: see hypertextformat
V
vertical writing mode 140
W
web-optimized PDF 241 winansi encoding 92 writing mode 140, 220 writing modes 140
U
UHC 102 underline parameter 133 units 59 UPR (Unix PostScript Resource) 52 file format 53 file searching 55 usehypertextencoding parameter 88
X
xheight 130 XMP metadata 220 XObjects 64
Z
ZapfDingbats font 123 zSeries 57
312
Index
ABC
PDFlib GmbH Franziska-Bilek-Weg 9 80339 Mnchen, Germany www.pdflib.com phone +49 89 452 33 84-0 fax +49 89 452 33 84-99
If you have questions check the PDFlib mailing list and archive at tech.groups.yahoo.com/group/pdflib Licensing contact [email protected] Support [email protected] (please include your license number)