The X E TEX Project: Typesetting For The Rest of The World: Jonathan Kew
The X E TEX Project: Typesetting For The Rest of The World: Jonathan Kew
E
T
E
X project:
typesetting for the rest of the world
Jonathan Kew
SIL International
Horsleys Green
High Wycombe HP14 3XL
England
[email protected]
Abstract
Tis paper will introduce the X
E
T
E
X project, an extension of T
E
X that integrates its typeset-
ting capabilities with the Unicode text encoding standard, supporting all the worlds scripts,
and with modern font technologies provided by todays operating systems and text layout
services.
X
E
T
E
X oers the potential to be T
E
X for the rest of the world in several senses, as will
be discussed and demonstrated:
Much of the intimidating complexity of managing a T
E
X installationin particular,
the process of installing and using new fonts is eliminated by X
E
T
E
Xs integration
with the host operating systems font management. Tis greatly reduces the barrier
to entry into the T
E
X world for many non-technical users, and provides a richer and
more exible typographic environment.
Because X
E
T
E
X is based on Unicode, the universal character encoding standard, and
uses OpenType and AAT layout features in modern fonts to support complex non-
Latin writing systems, it can work with Asian, Middle Eastern, and other traditionally
dicult languages just as readily as with European languages.
X
E
T
E
X was initially designed and implemented for Mac OS X, leveraging several key
technologies available on that platform. However, this meant it was available only to
a fairly small minority of potential users. However, with the introduction of X
E
T
E
X
for Linux, the benets of X
E
T
E
X become available to a new and wider community of
users.
Introduction
X
E
T
E
X
1
is an extension of the T
E
X processor, designed to
integrate T
E
Xs typesetting language and document for-
matting capabilities with the Unicode/ISO10646 universal
character encoding for all the worlds scripts, and with the
font technologies available on todays computer systems.
Tis includes fonts that support complex non-Latin writ-
ing systems and very large character sets, as well as the wide
variety of Western typefaces now available.
X
E
T
E
X is in fact based on -T
E
X, and therefore in-
cludes a number of well-established extensions to T
E
X.
Tese include additional registers (\count, \dimen, \box,
1
Te name X
E
T
E
X was inspired by the idea of a Mac OS X extension
(hence the X prex) to -T
E
X; and as one of its intended uses is for bidi-
rectional scripts such as Hebrew and Arabic, the name was designed to be
reversible. Te second letter should ideally be U+018E LATIN CAPITAL
LETTER REVERSED E, but as few current fonts support this character, it
is normal to use a rotated or reected E glyph. Te name is pronounced
as if it were written zee-T
E
X.
etc.) beyond the 256 of each that T
E
X provides; various
new conditional commands, tracing features, etc.; and of
particular signicance for multilingual work, the T
E
X--X
E
T
extension for bidirectional layout.
Te T
E
X extensions inherited from -T
E
X are not dis-
cussed further here, as they are already described in the
-T
E
X documentation
2
, except to note that for right-to-
left scripts in X
E
T
E
X, it is necessary to set \TeXXeTstate=1
and make proper use of the direction-changing commands
\beginR, \endR, etc. Without these, there will still be some
right-to-left behavior due to the inherent directionality de-
ned by the Unicode standard for characters belonging to
Hebrew, Arabic and similar scripts, but overall layout will
not be correct.
Using X
E
T
E
X in conjunction with higher-level macro
packages such as L
a
T
E
Xor ConT
E
Xt provides a powerful and
exible typesetting system that combines the strengths of
2
E.g., e -T
E
X Short Reference Manual, http://www.staff.
uni-mainz.de/knappen/etex_ref.html.
XIV Oglnopolska Konferencja Polskiej Grupy Uytkownikw Systemu T
E
X 3
Jonathan Kew
these well-developed markup systems and formatting tools
with easy support for a huge range of industry-standard
fonts and all the scripts and languages supported by the
Unicode standard.
A rich world of fonts
Font installation In its early years, many users saw T
E
X
as being inextricably linked with the Computer Modern
typeface family created by Don Knuth specically to work
with T
E
X. In principle, other typefaces could be used, but
few were available in a form that the T
E
X software could
use, and few users knew how to install or access them.
As PostScript printers became widespread, T
E
X macro
packages and supporting les (.tfms, etc.) for fonts such
as Times Roman and Helvetica were created and became
part of typical T
E
X installations. Te New Font Selec-
tion Scheme (NFSS) for L
a
T
E
X played a key role in al-
lowing users easier access to alternative typefaces. A simple
\usepackage{times} in the preamble of a L
a
T
E
X docu-
ment could change the fonts throughout an article in a
co-ordinated fashion.
However, for most users the choice of typefaces was
still limited to those for which a precongured L
a
T
E
X pack-
age was available. Although various tools, scripts, and ar-
ticles tried to simplify and explain the steps needed, most
non-technical users were still overwhelmed by the apparent
complexity and the technical knowledge required. (Do I
want to use OT1 or T1 encoding, or perhaps Y&Y? How
do I make dvips use a .ttf font? What exactly do I put
in my .fd leand where does that le need to go? Do I
need to create a virtual font? How do I activate new .map
le entries? Etc., etc. with apologies to those for whom
these issues are second nature.)
For an average user of a modern desktop computer
and typical GUI software, using a new font in a document
involves approximately two steps:
1. Drop the .ttf or .otf le into the computers Fonts
folder;
2. Select the font name from a menu in any application.
Any softwareespecially software that relates to typogra-
phythat requires a longer or more complex procedure
will be perceived as user-unfriendly and hard to use,
and will face a barrier to wide acceptance.
X
E
T
E
X aims to bring this level of simplicity to the use
of fonts with T
E
X. While selecting a font from a menu of
installed fonts does not directly t the T
E
X paradigm, the
use of a new font is similarly straightforward:
1. Drop the .ttf or .otf le into the computers Fonts
folder;
2. Specify the font by name in the T
E
X document.
In Plain T
E
X terms, this second step might be:
\font\myfont="Charis SIL" at 9pt
\myfont Hello World
which results in Hello World in the typeset document.
L
a
T
E
Xusers do not normally declare fonts directly with
T
E
Xs \font command. Instead, they can say things like
\setromanfont{Charis SIL}
3
in the preamble of the doc-
ument. Te present article, for example, includes the lines:
\usepackage{fontspec}
\setromanfont{Adobe Garamond Pro}
\setmonofont[Scale=MatchLowercase]
{Andale Mono WT J}
Tese simple declarations are sucient to use Adobe Gara-
mond Pro (an OpenType font) as the primary typeface
family throughout the article, with Andale Mono WT J (a
monospaced TrueType font with an extended character set)
for typewriter text, scaled to match the lowercase height
of the Garamond. Te fonts were installed by dropping
them in the computers Fonts folder; no additional T
E
X-
specic steps such as le format conversions were required,
no .tfms, no .fds, no .map les, etc.
Rich typographic features Modern OpenType and AAT
fonts may provide a variety of sophisticated typographic
features, far beyond the simple ligatures and kerning fa-
miliar to T
E
X users. For example, the cursive Zapno font
contains many alternate forms for use in specic contexts,
as well as alternates that can be explicitly chosen by the
user:
\font\zapfino = "Zapfino" at 7pt \zapfino
A sample of Zapfino using the default
settings built in to the font.
\font\zapfiii = "Zapfino:Stylistic
variants=Third variant glyph set"
at 7pt \zapfiii
A sample of Zapfino using the third of
several variant settings.
A sample of ! using " default se#ings
built in to " font.
A sample of ! using " 1ird of
several va5ant se#ings.
Regular text faces may also include a number of in-
teresting features, such as true Sxa Cairas, choice of
lining (0123456789) or oldstyle (o1:o;8,) numerals,
automatic formation of arbitrary fractions (c) and oth-
ers. Te \font command accepts options to select whatever
OpenType or AAT typographic features the font supports;
3
Tis relies on the fontspec package by Will Robertson, which in-
tegrates X
E
T
E
X font support with the standard L
a
T
E
X font selection mech-
anisms.
4 Bachotek, 29 kwietnia 2 maja 2006
Te X
E
T
E
X project: typesetting for the rest of the world
or for L
a
T
E
X users, the fontspec package provides a higher-
level, unied interface to such features, independent of the
particular font technology. Te rst sentence of this para-
graph, for example, appears in the source document as:
Regular text faces may also include a
number of interesting features, such as
true{\addfontfeature{Letters=SmallCaps}
Small Capitals}, choice of lining
(0123456789) or oldstyle ({\addfontfeature
{Numbers=Lowercase}0123456789})
numerals, automatic formation of arbitrary
fractions ({\addfontfeature{Fractions=On}%
98/765}) and others.
Any language, any script
Unlike T
E
X, which treated 8-bit characters as the funda-
mental units of text, X
E
T
E
X is based on the Unicode char-
acter set. By default, it reads input text as Unicode (sup-
porting both UTF-8 and UTF-16), and expects Unicode-
compliant fonts so that any valid Unicode character can
be directly typeset, provided the font in use supports the
relevant range of Unicode.
At a simple level, this means that with Unicode-
compliant fonts, a wide range of accented and other spe-
cial characters can be used with no special eort; they just
work:
\font\iwona="Iwona-Medium" at 9.5pt \iwona
Hej Slovan, jet nae slovansk e ije.
inn tti tv brr. Ht annarr V,
en annarr Vlir.
\font\charis="Charis SIL" at 9pt \charis
Dnyay verelim ocuklara hi deilse bir
gnlne.
Kur bga ep, kur Nmunas tka, ta ms
tvn, gra Lietuv.
Hej Slovan, jet nae slovansk e ije.
inn tti tv brr. Ht annarr V, en annarr
Vlir.
Dnyay verelim ocuklara hi deilse bir
gnlne.
Kur b
s
tvn, gra Lietuv.
In addition to direct input of Unicode text, it is pos-
sible to use \char with Unicode character codes, so
that \char"0164\char"0119\char"015B\char"0165 will
produce . With an appropriate font selected, even
characters such as Ugaritic ! or Linear B "can be
printed using their standard Unicode codepoints (those
were \char"10384 and \char"10082, using the Code2001
font).
Language-specic variants OpenType fonts may contain
variant glyphs or behavior designed to support the typo-
graphic practices of specic languages. X
E
T
E
X can access
these features by adding a language code to the \font dec-
laration; for example, Vietnamese uses dierent diacritic
placement rules than the default stacking that is expected
for arbitrary combinations of diacritics in generic Latin
script:
\font\D="Doulos SIL" at 9pt
\font\V="Doulos SIL:language=VIT" at 9pt
\D cung cp mt con s duy nht cho mi k t
\V cung cp mt con s duy nht cho mi k t
cung cp mt con s duy nht cho mi k t
cung c'p mt con s( duy nh't cho m)i k t
Large character sets Because X
E
T
E
X uses Unicode as its
text encoding, large character sets such as those needed for
Chinese and other East Asian languages present no real dif-
culties. Chinese characters are simply letters in the char-
acter set, just like English; all that is required is to select an
appropriate font:
\font\myfont="STFangsong" at 10pt
% select a font that support Chinese
\myfont %
Unicode...
Tis would be sucient to print the Chinese characters. An
additional complication for typesetting running text is that
some of these languages are written without word spaces,
so that T
E
X has no natural opportunity to break paragraphs
into lines, or to justify lines to a precise width. X
E
T
E
X solves
this by oering a mechanism to nd line-breaks according
to the Unicode-based break rules, which can vary accord-
ing to the settings of a specic locale (for example, Tai
requires rules based on a dictionary to help nd valid word
boundaries). Further, glue can be introduced at each poten-
tial break position, so that the resulting lines of text have
sucient exibility to be justied:
\XeTeXlinebreaklocale "zh"
% find line-break positions according
% to "zh" (Chinese) locale's rules
\XeTeXlinebreakskip = 0pt plus 1pt
% add a little stretchability to
% permit justification
Using these commands, X
E
T
E
X typesets East Asian lan-
guages just as readily as English:
Unicode