Legendre Transformation Intro
Legendre Transformation Intro
12
The aim of this report is to list and explain the basic properties of the Legendre-Fenchel
transform, which is a generalization of the Legendre transform commonly encountered
in physics. The precise way in which the Legendre-Fenchel transform generalizes the
Legendre transform is carefully explained and illustrated with many examples and pictures. The understanding of the difference between the two transforms is important because the general transform which arises in statistical mechanics is the Legendre-Fenchel
transform, not the Legendre transform.
All the results contained in this report can be found with much more mathematical
details and rigor in [1]. The proofs of these results can also be found in that reference.
13
1. Definitions
4
5
6
7
8
9
10
11
14
15
(1)
xR
16
17
18
(2)
kR
19
20
This corresponds also to the double LF transform of f (x). The double-star notation
comes obviously from our compact notation for the LF transform:
f = ( f ) = (( f ) ) .
1
(3)
21
22
Remark 1. LF transforms can also be defined using an infimum (min) rather than a
supremum (max):
g (k) = inf {kx g(x)}.
(4)
xR
23
24
Transforming one version of the LF transform to the other is just a matter of introducing
minus signs at the right place. Indeed,
f (k) = sup{kx f (x)} = inf{kx + f (x)},
(5)
(6)
25
so that
x
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
(7)
(8)
This form is more limited in scope than the LF transform, as it applies only to differentiable functions, and, we shall see later, convex functions. In this sense, the LF transform
is a generalization of the Legendre transform, which reduces to the Legendre transform
when applied to convex, differentiable functions. We shall comment more on this later.
Remark 3. The LF transform is not necessarily self-inverse (we also say involutive);
that is to say, f need not necessarily be equal to f . The equality f = f is taken for
granted too often in physics; well see later in which cases it actually holds and which
other cases it does not.
Remark 4. The definition of the LF transform can trivially be generalized to functions
defined on higher-dimensional spaces (i.e., functions f (x) : Rn R, with n a positive
integer) by replacing the normal real-number product kx by the scalar product k x,
where k is a vector having the same dimension as x.
44
45
46
47
48
49
It can be proved that the corrections to this approximation are subexponential in n, i.e.,
ln F(k, n) = n sup{kx f (x)} + o(n).
(11)
xR
50
51
52
(12)
55
where s(u) is the microcanonical entropy function of the system. (This can be taken as
a definition of the microcanonical entropy.) Defining the canonical partition function of
the system in the usual way, i.e.,
Z
Z n () = n (u)enu du,
(13)
56
53
54
1
() = lim ln Z n () = inf{u s(u)}.
n
u
n
57
58
59
60
(14)
Physically, () represents the free energy of the system in the canonical ensemble. So,
what the above result shows is that the canonical free energy is the LF transform of the
microcanonical entropy ( = s ). The inverse result, namely s = , is not always true,
as will become clear later.
61
2. Theory of LF transforms
62
63
Q1: How is the shape of f (k) determined by the shape of f (x), and vice versa?
64
66
Well see next that these two questions are answered by using a fundamental concept of
convex analysis known as a supporting line.
67
65
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
(16)
holds for all y 6= x. For these definitions to make sense, we need obviously to have
f < .
Remark 7. For convenience, it is useful to replace the expression f admits a supporting line at x by f is convex at x. So, from now on, the two expressions mean the
same (this is a definition). If f does not admit a supporting line at x, then we shall say
that f is nonconvex at x.
The geometrical interpretation of supporting lines is shown in Figure 1. In this figure,
we see that
The point a admits a supporting line ( f is convex at a). The supporting line has
the property that it touches f at the point (a, f (a)) and lies beneath the graph of
f (x) for all x; hence the term supporting.
The supporting line at a is strictly supporting because it touches the graph of f (x)
only at a. In this case, we say that f is strictly convex at a.
The point b does not admit any supporting lines; any lines passing through (b, f (b))
must cross the graph of f (x) at some point. In this case, we also say that f is nonconvex at b.
90
91
88
89
94
Proposition 1. If f admits a supporting line at x and f 0 (x) exists, then the slope of
the supporting line must be equal to f 0 (x). In other words, for differentiable functions,
a supporting line is also a tangent line.
95
92
93
97
Before answering Q1 and Q2, let us pause briefly for two important results, which we
state without proofs.
98
96
99
100
101
102
103
104
105
The Webster (7th Edition), for one, defines a -shaped function to be concave rather than convex. However,
most mathematical textbooks will agree in defining the same function to be convex.
Figure 2: Illustration of the duality property for supporting lines: points of f are transformed into slopes of f , and slopes of f are transformed into points of f .
106
107
108
109
110
111
112
115
116
117
118
113
114
121
Thus, from the point of view of f (x), we have that the LF transform is involutive at
x if and only if f is convex at x (in the sense of supporting lines). Changing our point of
view to f (k), we have the following:
122
119
120
123
124
125
Well see later with a specific example that the differentiability property of f is
sufficient (as stated) but non-necessary for f = f . For now, we note the following
obvious corollary:
126
127
128
129
130
132
133
Theorem 10. f (x) is the largest convex function satisfying f (x) f (x).
131
135
Because of this result, we call f (x) the convex envelope or convex hull of f (x).
Well precise the meaning of these expressions in the next section.
136
134
139
We consider in this section a number of examples which help visualize the meaning
and application of all the results presented in the previous section. All of the examples
considered arise in statistical mechanics.
140
141
The LF transform
137
138
(17)
142
143
is in general evaluated by finding the critical points xk (there could be more than one)
which maximize the function
F(x, k) = kx f (x).
144
145
146
147
148
(18)
(19)
where arg sup reads arguments of the supremum, and mean in words points at which
the maximum occurs.
Now, assume that f (x) is everywhere differentiable. Then, we can find the maximum
of F(x, k) using the common rules of calculus by solving
F(x, k) = 0,
x
(20)
149
for a fixed value of k. Given the form of F(x, k), this is equivalent to solving
k = f 0 (x)
150
151
152
153
for x given k. As noted before, there could be more than one critical points of F(x, k)
that would solve here the above differential equation. To make sure that there is actually
only one solution for every k R, we need to impose the following two conditions on
f:
154
155
156
157
158
159
160
161
162
163
164
165
Given these, we are assured that there exists a unique value xk for each k R satisfying
k = f 0 (xk ) and which maximizes F(x, k). As a result, we can write
f (k) = kxk f (xk ),
(22)
f 0 (xk ) = k.
(23)
where
These two equations define precisely what the Legendre transform of f (x) is (as
opposed to the LF transform, which is defined with the sup formula). Accordingly, we
have proved that the LF transform reduces to the Legendre transform for differentiable
and strictly convex functions. (The strictly convex property results from the monotonicity of f 0 (x).) Since f (x) at this point is convex by assumption, we must have f = f
for all x. Therefore, the Legendre transform must be involutive (always), and the inverse
Legendre transform is the Legendre transform itself; in symbol,
f (x) = k x x f (k x ),
166
168
169
170
171
172
(24)
167
(21)
(25)
184
185
186
187
188
189
190
192
The case of functions having more than one nondifferentiable point is treated similarly by considering each nondifferentiable point separately.
193
191
194
195
Since f (x) in the previous example is convex, f (x) = f (x) for all x, and so the roles
of f and f can be inverted to obtain the following: a convex function f (x) having an
10
Figure 4: Nondifferentiable points are transformed into affine parts under the action of
the LF transform and vice versa.
199
affine part has a LF transform f (k) having one nondifferentiable point; see Figure 4.
More precisely, if f (x) is affine over (xl , x h ) with slope kc in that interval, then f (k)
will have a nondifferentiable point at kc with left- and right-derivatives at kc given by xl
and x h , respectively.
200
196
197
198
215
Consider the function f (x) shown in Figure 5. This function has the particularity to be
defined only on a bounded domain of x-values, which we denote by [xl , x h ]. Furthermore, f 0 (x) as x xl + 0 and x x h 0 (the derivative of f blows up near at
the boundaries). Outside the interval of definition of f (x), we formally set f (x) = .
To determine the shape of f (k), we use again what we know about supporting lines
of f and f . All points (x, f (x)) with x (xl , x h ) admit a strict supporting line with
slope k(x). These points are represented at the level of f by points (k(x), f (k(x)))
having a supporting line of slope x. As x approaches xl from the right, the slope of f (x)
diverges to . At the level of f , this implies that the slope of the supporting line of
f reaches xl as k . Similarly, since the slope of f (x) goes to + as x x h ,
the slope of the supporting line of f reaches the value x h as k +; see Figure 5.
Note, finally, that f = f since f is convex. This means that we can invert the roles
of f and f in this example just like in the previous one to obtain the following: the LF
transform of a convex function which is asymptotically linear is a convex function which
is finite on a bounded domain with diverging slopes at the boundaries.
216
201
202
203
204
205
206
207
208
209
210
211
212
213
214
217
218
219
220
221
Consider now a variation of the previous example. Rather than having diverging slopes
at the boundaries xl and x h , we assume that f (x) has finite slopes at these points. We
denote the right-derivative of f at xl by kl and its left-derivative at x h by kh .
For this example, everything works as in the previous example except that we have
to be careful about the boundary points. As in the case of nondifferentiable points, f at
11
Figure 5: Function defined on a bounded domain with diverging slopes at boundaries; its
LF transform is asymptotically linear as |k| .
Figure 6: Function defined on a bounded domain with finite slopes at boundaries; its LF
transform has affine parts outside some interior domain.
227
x h admits not one but infinitely many supporting lines with slopes taking values in the
range [kh , ). At the level of f , this means that all points (k, f (k)) with k [kh , )
have supporting lines with constant slope x h ; that is, f (k) is affine past kh with slope
x h . Likewise, f at xl admits an infinite number of supporting lines with slopes now
ranging from to kl . As a consequence, f must be affine over the range (, kl )
with constant slope xl ; see Figure 6.
228
222
223
224
225
226
229
230
231
232
233
234
Our last example is quite interesting, as it illustrates the precise case for which the LF
transform is not involutive, namely nonconvex functions.
The function that we consider is shown in Figure 7; it has three branches having the
following properties:
Branch a: The points on this branch, which extends from x = to xl , admit
strict supporting lines. This branch is thus transformed into a differentiable branch
12
236
242
Branch c: None of the points on this branch, which extends from (xl , x h ), admit
supporting lines. This means that these points are not represented at the level of
f . In other words, there is not one point of f which admits a supporting line
with slope in the range (xl , x h ). (That would contradict the fact that f has a
supporting line at k with slope x if and only if f admits a supporting line at x with
slope k.)
243
These three observations have two important consequences (see Figure 7):
237
238
239
240
241
244
245
246
247
248
249
250
251
252
253
254
255
1. f (k) must have a nondifferentiable point at kc , with kc equal to the slope of the
supporting line connecting the two points (xl , f (xl )) and (x h , f (x h )). This follows
since xl and x h share the same supporting line of slope kc . Thus, in a way, f must
have two slopes at kc .
2. Define the convex extrapolation of f (x) to be the function obtained by replacing
the nonconvex branch of f (x) (branch c) by the supporting line connecting the
two convex branches of f (a and b). Then, both the LF transforms of f and
its convex extrapolation yield f . This is evident from our previous working of
nondifferentiable and affine functions. It should also be evident from the example
of nondifferentiable functions that the convex extrapolation of f is nothing but
f , the double LF transform of f . This explains why we call f the convex
envelope of f .
13
257
258
260
261
262
That is, in this case, the LF transform is involutive (see Theorem 2.4).
263
265
266
267
268
269
270
271
(28)
where the arrows stand for the LF transform; see Figure 8. This diagram clearly shows
that the LF transform is non-involutive in general. For convex functions, i.e., functions
admitting supporting lines everywhere, the diagram reduces to
f
f .
264
(27)
Overall, this means that the LF transform has the following structure:
f f
f ,
259
(26)
(29)
14
272
273
274
275
276
277
278
279
280
The complete structure of the LF transform for general functions goes as follows:
f f
f ,
(30)
282
where the arrows denote the LF transform. For convex functions ( f = f ), this
reduces to
f
f ;
(31)
283
281
284
285
286
287
288
The LF transform is more general that the Legendre transform because it applies
to nonconvex functions as well as nondifferentiable functions.
The LF transform reduces to the Legendre transform in the case of convex, differentiable functions.
Acknowledgments
293
This report was written while visiting the Rockefeller University, New York, in July
2005. I wish to thank E.G.D. Cohen for his invitation to visit Rockefeller, as well as
for his many comments on the content and presentation of this report, and for his contagious desire to make things clear. My work is supported by the Natural Sciences and
Engineering Research Council of Canada (Post-Doctoral Fellowship).
294
References
295
289
290
291
292