0% found this document useful (0 votes)
69 views6 pages

The X E TEX Project: Typesetting For The Rest of The World: Jonathan Kew

XETEX is an extension of TeX that integrates its typesetting capabilities with Unicode and modern font technologies. This allows XeTeX to support all languages and scripts through Unicode-compliant fonts. It also simplifies font installation and use, allowing any font to be used by simply installing it on the system. XeTeX provides access to advanced typographic features in modern fonts. By using Unicode and native fonts, XeTeX can typeset any language as easily as European languages.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views6 pages

The X E TEX Project: Typesetting For The Rest of The World: Jonathan Kew

XETEX is an extension of TeX that integrates its typesetting capabilities with Unicode and modern font technologies. This allows XeTeX to support all languages and scripts through Unicode-compliant fonts. It also simplifies font installation and use, allowing any font to be used by simply installing it on the system. XeTeX provides access to advanced typographic features in modern fonts. By using Unicode and native fonts, XeTeX can typeset any language as easily as European languages.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Te X

E
T
E
X project:
typesetting for the rest of the world
Jonathan Kew
SIL International
Horsleys Green
High Wycombe HP14 3XL
England
[email protected]
Abstract
Tis paper will introduce the X
E
T
E
X project, an extension of T
E
X that integrates its typeset-
ting capabilities with the Unicode text encoding standard, supporting all the worlds scripts,
and with modern font technologies provided by todays operating systems and text layout
services.
X
E
T
E
X oers the potential to be T
E
X for the rest of the world in several senses, as will
be discussed and demonstrated:
Much of the intimidating complexity of managing a T
E
X installationin particular,
the process of installing and using new fonts is eliminated by X
E
T
E
Xs integration
with the host operating systems font management. Tis greatly reduces the barrier
to entry into the T
E
X world for many non-technical users, and provides a richer and
more exible typographic environment.
Because X
E
T
E
X is based on Unicode, the universal character encoding standard, and
uses OpenType and AAT layout features in modern fonts to support complex non-
Latin writing systems, it can work with Asian, Middle Eastern, and other traditionally
dicult languages just as readily as with European languages.
X
E
T
E
X was initially designed and implemented for Mac OS X, leveraging several key
technologies available on that platform. However, this meant it was available only to
a fairly small minority of potential users. However, with the introduction of X
E
T
E
X
for Linux, the benets of X
E
T
E
X become available to a new and wider community of
users.
Introduction
X
E
T
E
X
1
is an extension of the T
E
X processor, designed to
integrate T
E
Xs typesetting language and document for-
matting capabilities with the Unicode/ISO10646 universal
character encoding for all the worlds scripts, and with the
font technologies available on todays computer systems.
Tis includes fonts that support complex non-Latin writ-
ing systems and very large character sets, as well as the wide
variety of Western typefaces now available.
X
E
T
E
X is in fact based on -T
E
X, and therefore in-
cludes a number of well-established extensions to T
E
X.
Tese include additional registers (\count, \dimen, \box,
1
Te name X
E
T
E
X was inspired by the idea of a Mac OS X extension
(hence the X prex) to -T
E
X; and as one of its intended uses is for bidi-
rectional scripts such as Hebrew and Arabic, the name was designed to be
reversible. Te second letter should ideally be U+018E LATIN CAPITAL
LETTER REVERSED E, but as few current fonts support this character, it
is normal to use a rotated or reected E glyph. Te name is pronounced
as if it were written zee-T
E
X.
etc.) beyond the 256 of each that T
E
X provides; various
new conditional commands, tracing features, etc.; and of
particular signicance for multilingual work, the T
E
X--X
E
T
extension for bidirectional layout.
Te T
E
X extensions inherited from -T
E
X are not dis-
cussed further here, as they are already described in the
-T
E
X documentation
2
, except to note that for right-to-
left scripts in X
E
T
E
X, it is necessary to set \TeXXeTstate=1
and make proper use of the direction-changing commands
\beginR, \endR, etc. Without these, there will still be some
right-to-left behavior due to the inherent directionality de-
ned by the Unicode standard for characters belonging to
Hebrew, Arabic and similar scripts, but overall layout will
not be correct.
Using X
E
T
E
X in conjunction with higher-level macro
packages such as L
a
T
E
Xor ConT
E
Xt provides a powerful and
exible typesetting system that combines the strengths of
2
E.g., e -T
E
X Short Reference Manual, http://www.staff.
uni-mainz.de/knappen/etex_ref.html.
XIV Oglnopolska Konferencja Polskiej Grupy Uytkownikw Systemu T
E
X 3
Jonathan Kew
these well-developed markup systems and formatting tools
with easy support for a huge range of industry-standard
fonts and all the scripts and languages supported by the
Unicode standard.
A rich world of fonts
Font installation In its early years, many users saw T
E
X
as being inextricably linked with the Computer Modern
typeface family created by Don Knuth specically to work
with T
E
X. In principle, other typefaces could be used, but
few were available in a form that the T
E
X software could
use, and few users knew how to install or access them.
As PostScript printers became widespread, T
E
X macro
packages and supporting les (.tfms, etc.) for fonts such
as Times Roman and Helvetica were created and became
part of typical T
E
X installations. Te New Font Selec-
tion Scheme (NFSS) for L
a
T
E
X played a key role in al-
lowing users easier access to alternative typefaces. A simple
\usepackage{times} in the preamble of a L
a
T
E
X docu-
ment could change the fonts throughout an article in a
co-ordinated fashion.
However, for most users the choice of typefaces was
still limited to those for which a precongured L
a
T
E
X pack-
age was available. Although various tools, scripts, and ar-
ticles tried to simplify and explain the steps needed, most
non-technical users were still overwhelmed by the apparent
complexity and the technical knowledge required. (Do I
want to use OT1 or T1 encoding, or perhaps Y&Y? How
do I make dvips use a .ttf font? What exactly do I put
in my .fd leand where does that le need to go? Do I
need to create a virtual font? How do I activate new .map
le entries? Etc., etc. with apologies to those for whom
these issues are second nature.)
For an average user of a modern desktop computer
and typical GUI software, using a new font in a document
involves approximately two steps:
1. Drop the .ttf or .otf le into the computers Fonts
folder;
2. Select the font name from a menu in any application.
Any softwareespecially software that relates to typogra-
phythat requires a longer or more complex procedure
will be perceived as user-unfriendly and hard to use,
and will face a barrier to wide acceptance.
X
E
T
E
X aims to bring this level of simplicity to the use
of fonts with T
E
X. While selecting a font from a menu of
installed fonts does not directly t the T
E
X paradigm, the
use of a new font is similarly straightforward:
1. Drop the .ttf or .otf le into the computers Fonts
folder;
2. Specify the font by name in the T
E
X document.
In Plain T
E
X terms, this second step might be:
\font\myfont="Charis SIL" at 9pt
\myfont Hello World
which results in Hello World in the typeset document.
L
a
T
E
Xusers do not normally declare fonts directly with
T
E
Xs \font command. Instead, they can say things like
\setromanfont{Charis SIL}
3
in the preamble of the doc-
ument. Te present article, for example, includes the lines:
\usepackage{fontspec}
\setromanfont{Adobe Garamond Pro}
\setmonofont[Scale=MatchLowercase]
{Andale Mono WT J}
Tese simple declarations are sucient to use Adobe Gara-
mond Pro (an OpenType font) as the primary typeface
family throughout the article, with Andale Mono WT J (a
monospaced TrueType font with an extended character set)
for typewriter text, scaled to match the lowercase height
of the Garamond. Te fonts were installed by dropping
them in the computers Fonts folder; no additional T
E
X-
specic steps such as le format conversions were required,
no .tfms, no .fds, no .map les, etc.
Rich typographic features Modern OpenType and AAT
fonts may provide a variety of sophisticated typographic
features, far beyond the simple ligatures and kerning fa-
miliar to T
E
X users. For example, the cursive Zapno font
contains many alternate forms for use in specic contexts,
as well as alternates that can be explicitly chosen by the
user:
\font\zapfino = "Zapfino" at 7pt \zapfino
A sample of Zapfino using the default
settings built in to the font.
\font\zapfiii = "Zapfino:Stylistic
variants=Third variant glyph set"
at 7pt \zapfiii
A sample of Zapfino using the third of
several variant settings.
A sample of ! using " default se#ings
built in to " font.
A sample of ! using " 1ird of
several va5ant se#ings.
Regular text faces may also include a number of in-
teresting features, such as true Sxa Cairas, choice of
lining (0123456789) or oldstyle (o1:o;8,) numerals,
automatic formation of arbitrary fractions (c) and oth-
ers. Te \font command accepts options to select whatever
OpenType or AAT typographic features the font supports;
3
Tis relies on the fontspec package by Will Robertson, which in-
tegrates X
E
T
E
X font support with the standard L
a
T
E
X font selection mech-
anisms.
4 Bachotek, 29 kwietnia 2 maja 2006
Te X
E
T
E
X project: typesetting for the rest of the world
or for L
a
T
E
X users, the fontspec package provides a higher-
level, unied interface to such features, independent of the
particular font technology. Te rst sentence of this para-
graph, for example, appears in the source document as:
Regular text faces may also include a
number of interesting features, such as
true{\addfontfeature{Letters=SmallCaps}
Small Capitals}, choice of lining
(0123456789) or oldstyle ({\addfontfeature
{Numbers=Lowercase}0123456789})
numerals, automatic formation of arbitrary
fractions ({\addfontfeature{Fractions=On}%
98/765}) and others.
Any language, any script
Unlike T
E
X, which treated 8-bit characters as the funda-
mental units of text, X
E
T
E
X is based on the Unicode char-
acter set. By default, it reads input text as Unicode (sup-
porting both UTF-8 and UTF-16), and expects Unicode-
compliant fonts so that any valid Unicode character can
be directly typeset, provided the font in use supports the
relevant range of Unicode.
At a simple level, this means that with Unicode-
compliant fonts, a wide range of accented and other spe-
cial characters can be used with no special eort; they just
work:
\font\iwona="Iwona-Medium" at 9.5pt \iwona
Hej Slovan, jet nae slovansk e ije.
inn tti tv brr. Ht annarr V,
en annarr Vlir.
\font\charis="Charis SIL" at 9pt \charis
Dnyay verelim ocuklara hi deilse bir
gnlne.
Kur bga ep, kur Nmunas tka, ta ms
tvn, gra Lietuv.
Hej Slovan, jet nae slovansk e ije.
inn tti tv brr. Ht annarr V, en annarr
Vlir.
Dnyay verelim ocuklara hi deilse bir
gnlne.
Kur b

ga ep, kur Nmunas tka, ta m

s
tvn, gra Lietuv.
In addition to direct input of Unicode text, it is pos-
sible to use \char with Unicode character codes, so
that \char"0164\char"0119\char"015B\char"0165 will
produce . With an appropriate font selected, even
characters such as Ugaritic ! or Linear B "can be
printed using their standard Unicode codepoints (those
were \char"10384 and \char"10082, using the Code2001
font).
Language-specic variants OpenType fonts may contain
variant glyphs or behavior designed to support the typo-
graphic practices of specic languages. X
E
T
E
X can access
these features by adding a language code to the \font dec-
laration; for example, Vietnamese uses dierent diacritic
placement rules than the default stacking that is expected
for arbitrary combinations of diacritics in generic Latin
script:
\font\D="Doulos SIL" at 9pt
\font\V="Doulos SIL:language=VIT" at 9pt
\D cung cp mt con s duy nht cho mi k t
\V cung cp mt con s duy nht cho mi k t
cung cp mt con s duy nht cho mi k t
cung c'p mt con s( duy nh't cho m)i k t
Large character sets Because X
E
T
E
X uses Unicode as its
text encoding, large character sets such as those needed for
Chinese and other East Asian languages present no real dif-
culties. Chinese characters are simply letters in the char-
acter set, just like English; all that is required is to select an
appropriate font:
\font\myfont="STFangsong" at 10pt
% select a font that support Chinese
\myfont %
Unicode...
Tis would be sucient to print the Chinese characters. An
additional complication for typesetting running text is that
some of these languages are written without word spaces,
so that T
E
X has no natural opportunity to break paragraphs
into lines, or to justify lines to a precise width. X
E
T
E
X solves
this by oering a mechanism to nd line-breaks according
to the Unicode-based break rules, which can vary accord-
ing to the settings of a specic locale (for example, Tai
requires rules based on a dictionary to help nd valid word
boundaries). Further, glue can be introduced at each poten-
tial break position, so that the resulting lines of text have
sucient exibility to be justied:
\XeTeXlinebreaklocale "zh"
% find line-break positions according
% to "zh" (Chinese) locale's rules
\XeTeXlinebreakskip = 0pt plus 1pt
% add a little stretchability to
% permit justification
Using these commands, X
E
T
E
X typesets East Asian lan-
guages just as readily as English:

Unicode

XIV Oglnopolska Konferencja Polskiej Grupy Uytkownikw Systemu T


E
X 5
Jonathan Kew
Complex-script languages Many non-Latin writing sys-
tems involve complex rendering rules, not simply printing
one character after another in a linear fashion. Unicode en-
codes the fundamental characters that represent the text,
but the display or printing system is responsible to map
these to the proper glyphs to produce the right visual ap-
pearance. X
E
T
E
X relies on AAT or OpenType fonts with the
correct tables to support such scripts, so that they automat-
ically work in typeset documents exactly as they work in
mainstream graphical applications.
For example, in Devanagari script, the short i vowel
mark appears to the left of the preceding consonant, even
though it is encoded after it; and consonant clusters are
written using special half-form or conjunct characters,
depending on the exact letters involved. With the appro-
priate fonts, this is all handled transparently during the
typesetting process, with no complex macros or special pre-
processing of the text:
\font\dev="Devanagari MT" at 9pt
\dev !#
Similarly, Arabic uses contextual variants of the letters so
that they connect in a cursive script:
\font\arb="Geeza Pro" at 9pt
\arb
Tese examples use AAT fonts, which work with the
Mac OS X Unicode text system to automatically render the
text correctly. When using OpenType fonts, there is a mi-
nor dierence: it is necessary to specify the script to be used,
as OpenType relies on script-specic shaping engines to
control certain aspects of the character behavior. Afont may
support several scripts with dierent behaviors, so X
E
T
E
X
cannot always assume, merely fromthe font selected, which
shaping engine should be used. Terefore, equivalent exam-
ples using OpenType fonts would look like:
\font\dev="Gargi_1.7:script=deva" at 9pt
\dev #
\font\arb="ae_AlMohanad:script=arab" at 9pt
\arb
If no script is specied for an OpenType font, X
E
T
E
X
will use its generic Latin engine, which applies common
features such as ligatures and diacritic positioning, if avail-
able in the font, but does not provide the contextual shap-
ing needed by complex Asian scripts. Te results would
be similar to the text as it appears in the typewriter text
showing the input to the X
E
T
E
X processor; while the cor-
rect characters are shown, the text as a whole is not written
properly.
Multi-directional text Not all languages and scripts are
written from left to right across the page, which is T
E
Xs
natural way of typesetting. Some scripts run from right to
left, and some are even written vertically.
For right-to-left text, X
E
T
E
X supports the T
E
X--X
E
T
\beginR and \endR commands (and the \beginL and
\endL commands, needed for left-to-right text embedded
within a right-to-left environment), as implemented in -
T
E
X. Even without these commands, individual words in
scripts such as Arabic or Hebrew will appear correctly, be-
cause the Unicode characters have directional properties,
but the T
E
X--X
E
T commands must be used for overall lay-
out to work properly. For example, a typical idiom would
be:
\everypar={\setbox0=\lastbox
% save the paragraph indent
\beginR % begin R-L typesetting
\box0 } % restore indent at R side
Tis will cause all following paragraphs, until \everypar is
reset, to default to right-to-left layout:
!" .&'( )'* +, -./ 2'3 )5 -89:
-/ ;<)= 9>-?3 &@ AB./ +C= .+D< 9E F'G9G+I 2'3
2<JG .-'( +D* 9'K +@ )5 -?3 +@

OP-* &< Q'(
&S )5 .+D'* +U +B: &/ .+DU +B: ;G &= VWX )5
.&'( -, ;<)= +,

OB: 2< .+< +Z[ +B: ;G
&/ . &@ ;<)= 2'\-', 2CBE &]-= &@

OB:

&*
.&< 2CBE &E9C* &< .&'U `ab +U +cd=-/
An additional attribute that can be specied for AAT
fonts in X
E
T
E
X is vertical. Tis causes the text ren-
dering system to use vertical text-layout techniques, al-
though it does not in itself re-orient the overall layout.
Typically, glyphs will be rotated 90 counter-clockwise,
l i
k e t
h
i s , and laid out according to their vertical
rather than horizontal metrics.
If this capability is combined with macros that rotate
the text block as a whole, which is readily achieved through
graphic transformations in the output driver (see gure 1),
it becomes possible to typeset languages such as Chinese
using a traditional vertical layout. Figure 2 shows a sample
text formatted in both horizontal and vertical styles. (Te
gure here is generated by code similar to that shown in
gure 1, but the rotation to produce vertical text is applied
just within a single minipage rather than to the entire page
via the \output routine.) Note how certain glyphs such as
the brackets do not undergo the same rotation as the rest
of the text; the AAT vertical attribute automatically gives
the correct behavior here.
X
E
T
E
X escapes from its nest
In the beginning Te X
E
T
E
X program was begun as a
project to integrate the rich support for international text
and font support in Mac OS X with the T
E
X formatting en-
gine. Te approach of leveraging existing system libraries to
handle Unicode, complex fonts and typographic features,
graphics and PDF meant that a robust and highly func-
tional systemcould be assembled with relatively little eort.
6 Bachotek, 29 kwietnia 2 maja 2006
Te X
E
T
E
X project: typesetting for the rest of the world
\newif\ifVertical \Verticaltrue % \Verticalfalse gives horizontal layout
\vsize=7in \hsize=4.5in \def\Vert{} % set up page size
\ifVertical % set parameters for vertical layout
\hsize=7in \vsize=4.5in \def\Vert{:vertical} % attribute used in font defs
% macro to rotate a box of Chinese text set with the "vertical" font attribute
\def\ChineseBox#1{\setbox0=\vbox{\boxmaxdepth=0pt #1}\dimen0=\wd0 \dimen2=\ht0
\vbox to \dimen0{\hbox to \dimen2{\hfil\special{x:gsave}\special{x:rotate -90}\rlap
{\vbox to 0pt{\box0\vss}}\special{x:grestore}}\vss}}
\def\ChineseOutput{\shipout \vbox{\ChineseBox{\makeheadline \pagebody \makefootline }}
\advancepageno \ifnum \outputpenalty >-20000 \else \dosupereject \fi}
\output={\ChineseOutput} \fi
\font\body="STKaiti\Vert" at 12pt \body
\font\bold="STHeiti\Vert" at 12pt \font\title="STHeiti\Vert" at 18pt
\centerline{\title }
\bigskip
\centerline{\bold}
\XeTeXlinebreaklocale "zh"
\XeTeXlinebreakskip = 0pt plus 1pt minus 0.1pt
\medskip
\leftline{}
\par
\par
% ...etc...
Figure 1: Using a font attribute and graphic transformations to implement vertical typesetting
As a consequence of this starting point, however,
X
E
T
E
X has been a single-platform system for the rst two
years of its existence, from the rst publicly-released devel-
opment version in April 2004. For Mac OS X users, it has
oered an alternative to traditional T
E
X implementations,
with some exciting new capabilities (in addition to some
compatibility issues, naturally!). For the great majority of
T
E
X users, however, font support and international typog-
raphy have remained serious challenges, and a Mac OS X-
only system had nothing to oer them except a tantalizing
glimpse of other possibilities.
Branching out Following the initial development on the
Mac OS X platform, X
E
T
E
X is now ready to stretch its
wings and make its rst moves into the wider T
E
X world
on other architectures. At the time of writing (beginning
of April, 2006), it is now possible to run X
E
T
E
X on Linux,
including of course distributions running on standard x86-
based PC systems.
In the Linux version of the system, the Mac OS X
font and text APIs used in the original implementation are
substituted with code using Fontcong and FreeType for
font access. Support for OpenType layout features and in-
ternational text is provided using the ICU library (which
is also used in the Mac OS X version, alongside the na-
tive ATSUI system). Graphics support, originally based on
Apples QuickTime, is provided through the ImageMagick
library on Linux. Tese technologies have enabled creation
of a Linux-based X
E
T
E
X formatting engine with the same
capabilities as the Mac OS X version, except that there is
no support for AAT font features as AAT fonts are not
normally used on non-Apple platforms.
Te remaining part required for a complete system
is an output driver that handles .xdv les, the extended
.dvi format that X
E
T
E
X generates. On Mac OS X, this was
implemented using the Quartz2D graphics system. As a
replacement, an extended version of the DVIPDFMx driver
has been created, thanks to generous assistance from Jin-
Hwan Cho (one of the primary authors of that driver).
Tis provides a portable PDF-generating back-end for the
system.
To provide a graphical working environment, it is
possible to congure the Kile T
E
X/L
a
T
E
X environment on
Linux to run X
E
T
E
X or X
E
L
a
T
E
X as its typesetting pro-
cess. Tis provides users with an editor that can work with
Unicode T
E
X source documents, and can run the typeset-
ting engine and view the resulting PDF at the touch of a
keystroke, making use of TrueType and OpenType fonts
just as readily as typical KDE or Gnome-based GUI appli-
cations.
Current status At present, the Linux implementation
should still be considered a prototype, and will doubtless
benet from renement over the coming months. Packag-
ing and installation, in particular, are at early stages. But the
systemseems to run well, and has been successfully built on
XIV Oglnopolska Konferencja Polskiej Grupy Uytkownikw Systemu T
E
X 7
Jonathan Kew

Figure 2: Chinese text in horizontal and vertical formats


(at least) SuSE, Ubuntu, and Gentoo; users of other distri-
butions are invited to share their experiences and contribute
any necessary patches.
Looking ahead, besides rening the Linux version to
ensure that it is usable on all distributions and architectures
(64-bit systems will undoubtedly require some work, for
example), and on other Unix-like operating systems, we
also hope to adapt the code to provide a native Windows
version of the tool. Tis will be based closely on the Linux
version, except that it will need to locate installed fonts
through Windows GDI instead of the Fontcong library.
For the latest information, and downloads of both
binary packages (where available) and source code (for
more adventurous users), see the X
E
T
E
X web site at http:
//scripts.sil.org/xetex. Feedback and suggestions are
always welcome, with the aim of providing a powerful and
exible typesetting system that works smoothly with to-
days and tomorrows text and font technologies, and with
all the worlds languages and scripts.
8 Bachotek, 29 kwietnia 2 maja 2006

You might also like