Power Laws, Pareto Distributions and Zipf's Law
Power Laws, Pareto Distributions and Zipf's Law
M. E. J. Newman
Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor,
MI 48109. U.S.A.
arXiv:cond-mat/0412004v3 [cond-mat.stat-mech] 29 May 2006
When the probability of measuring a particular value of some quantity varies inversely as a power
of that value, the quantity is said to follow a power law, also known variously as Zipf’s law or the
Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences,
economics and finance, computer science, demography and the social sciences. For instance,
the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people’s
personal fortunes all appear to follow power laws. The origin of power-law behaviour has been
a topic of debate in the scientific community for more than a century. Here we review some of
the empirical evidence for the existence of power-law forms and the theories proposed to explain
them.
6 4
percentage
4
2
2
1
0 0
0 50 100 150 200 250 0 20 40 60 80 100
FIG. 1 Left: histogram of heights in centimetres of American males. Data from the National Health Examination Survey,
1959–1962 (US Department of Health and Human Services). Right: histogram of speeds in miles per hour of cars on UK
motorways. Data from Transport Statistics 2003 (UK Department for Transport).
-2
0.004 10
-3
percentage of cities
10
0.003
-4
10
-5
0.002 10
-6
10
0.001
-7
10
-8
0 10
0 5 5 4 5 6 7
2×10 4×10 10 10 10 10
population of city
FIG. 2 Left: histogram of the populations of all US cities with population of 10 000 or more. Right: another histogram of the
same data, but plotted on logarithmic scales. The approximate straight-line form of the histogram in the right panel implies
that the distribution follows a power law. Data from the 2000 US Census.
is fixed, it is determined by the requirement that the Power-law distributions are the subject of this arti-
distribution p(x) sum to 1; see Section III.A.) cle. In the following sections, I discuss ways of detecting
Power-law distributions occur in an extraordinarily di- power-law behaviour, give empirical evidence for power
verse range of phenomena. In addition to city popula- laws in a variety of systems and describe some of the
tions, the sizes of earthquakes [3], moon craters [4], solar mechanisms by which power-law behaviour can arise.
flares [5], computer files [6] and wars [7], the frequency of Readers interested in pursuing the subject further may
use of words in any human language [2, 8], the frequency also wish to consult the reviews by Sornette [18] and
of occurrence of personal names in most cultures [9], the Mitzenmacher [19], as well as the bibliography by Li.2
numbers of papers scientists write [10], the number of
citations received by papers [11], the number of hits on
web pages [12], the sales of books, music recordings and
almost every other branded commodity [13, 14], the num- tical distributions of quantities. For instance, Newton’s famous
bers of species in biological taxa [15], people’s annual in- 1/r 2 law for gravity has a power-law form with exponent α = 2.
comes [16] and a host of other variables all follow power- While such laws are certainly interesting in their own way, they
law distributions.1 are not the topic of this paper. Thus, for instance, there has
in recent years been some discussion of the “allometric” scal-
ing laws seen in the physiognomy and physiology of biological
organisms [17], but since these are not statistical distributions
they will not be discussed here.
1 2 http://linkage.rockefeller.edu/wli/zipf/.
Power laws also occur in many situations other than the statis-
II Measuring power laws 3
1.5 0
(a) 10 (b)
-1
10
1
samples
samples
-2
10
-3
0.5 10
-4
10
-5
0 10
0 2 4 6 8 1 10 100
x x
0
10
(d)
-3
samples
10 -2
10
-5
10
-4
-7 10
10
-9
10
1 10 100 1000 1 10 100 1000
x x
FIG. 3 (a) Histogram of the set of 1 million random numbers described in the text, which have a power-law distribution with
exponent α = 2.5. (b) The same histogram on logarithmic scales. Notice how noisy the results get in the tail towards the
right-hand side of the panel. This happens because the number of samples in the bins becomes small and statistical fluctuations
are therefore large as a fraction of sample number. (c) A histogram constructed using “logarithmic binning”. (d) A cumulative
histogram or rank/frequency plot of the same data. The cumulative distribution also follows a power law, but with an exponent
of α − 1 = 1.5.
II. MEASURING POWER LAWS numbers, produced by binning them into bins of equal
size 0.1. That is, the first bin goes from 1 to 1.1, the
Identifying power-law behaviour in either natural or second from 1.1 to 1.2, and so forth. On the linear scales
man-made systems can be tricky. The standard strategy used this produces a nice smooth curve.
makes use of a result we have already seen: a histogram To reveal the power-law form of the distribution it is
of a quantity with a power-law distribution appears as better, as we have seen, to plot the histogram on logarith-
a straight line when plotted on logarithmic scales. Just mic scales, and when we do this for the current data we
making a simple histogram, however, and plotting it on see the characteristic straight-line form of the power-law
log scales to see if it looks straight is, in most cases, a distribution, Fig. 3b. However, the plot is in some re-
poor way proceed. spects not a very good one. In particular the right-hand
Consider Fig. 3. This example shows a fake data set: end of the distribution is noisy because of sampling er-
I have generated a million random real numbers drawn rors. The power-law distribution dwindles in this region,
from a power-law probability distribution p(x) = Cx−α meaning that each bin only has a few samples in it, if
with exponent α = 2.5, just for illustrative purposes.3 any. So the fractional fluctuations in the bin counts are
Panel (a) of the figure shows a normal histogram of the large and this appears as a noisy curve on the plot. One
way to deal with this would be simply to throw out the
data in the tail of the curve. But there is often useful in-
formation in those data and furthermore, as we will see
3 This can be done using the so-called transformation method. If in Section II.A, many distributions follow a power law
we can generate a random real number r uniformly distributed in only in the tail, so we are in danger of throwing out the
the range 0 ≤ r < 1, then x = xmin (1 − r)−1/(α−1) is a random baby with the bathwater.
power-law-distributed real number in the range xmin ≤ x < ∞ An alternative solution is to vary the width of the bins
with exponent α. Note that there has to be a lower limit xmin
on the range; the power-law distribution diverges as x → 0—see
in the histogram. If we are going to do this, we must
Section II.A. also normalize the sample counts by the width of the
4 Power laws, Pareto distributions and Zipf’s law
bins they fall in. That is, the number of samples in a bin which is 1 less than the original exponent. Thus, if we
of width ∆x should be divided by ∆x to get a count per plot P (x) on logarithmic scales we should again get a
unit interval of x. Then the normalized sample count straight line, but with a shallower slope.
becomes independent of bin width on average and we are But notice that there is no need to bin the data at
free to vary the bin widths as we like. The most common all to calculate P (x). By its definition, P (x) is well-
choice is to create bins such that each is a fixed multiple defined for every value of x and so can be plotted as a
wider than the one before it. This is known as loga- perfectly normal function without binning. This avoids
rithmic binning. For the present example, for instance, all questions about what sizes the bins should be. It
we might choose a multiplier of 2 and create bins that also makes much better use of the data: binning of data
span the intervals 1 to 1.1, 1.1 to 1.3, 1.3 to 1.7 and so lumps all samples within a given range together into the
forth (i.e., the sizes of the bins are 0.1, 0.2, 0.4 and so same bin and so throws out any information that was
forth). This means the bins in the tail of the distribu- contained in the individual values of the samples within
tion get more samples than they would if bin sizes were that range. Cumulative distributions don’t throw away
fixed, and this reduces the statistical errors in the tail. It any information; it’s all there in the plot.
also has the nice side-effect that the bins appear to be of Figure 3d shows our computer-generated power-law
constant width when we plot the histogram on log scales. data as a cumulative distribution, and indeed we again
I used logarithmic binning in the construction of see the tell-tale straight-line form of the power law, but
Fig. 2b, which is why the points representing the individ- with a shallower slope than before. Cumulative distribu-
ual bins appear equally spaced. In Fig. 3c I have done tions like this are sometimes also called rank/frequency
the same for our computer-generated power-law data. As plots for reasons explained in Appendix A. Cumula-
we can see, the straight-line power-law form of the his- tive distributions with a power-law form are sometimes
togram is now much clearer and can be seen to extend for said to follow Zipf ’s law or a Pareto distribution, af-
at least a decade further than was apparent in Fig. 3b. ter two early researchers who championed their study.
Even with logarithmic binning there is still some noise Since power-law cumulative distributions imply a power-
in the tail, although it is sharply decreased. Suppose the law form for p(x), “Zipf’s law” and “Pareto distribu-
bottom of the lowest bin is at xmin and the ratio of the tion” are effectively synonymous with “power-law distri-
widths of successive bins is a. Then the kth bin extends bution”. (Zipf’s law and the Pareto distribution differ
from xk−1 = xmin ak−1 to xk = xmin ak and the expected from one another in the way the cumulative distribution
number of samples falling in this interval is is plotted—Zipf made his plots with x on the horizon-
Z xk Z xk tal axis and P (x) on the vertical one; Pareto did it the
p(x) dx = C x−α dx other way around. This causes much confusion in the lit-
xk−1 xk−1 erature, but the data depicted in the plots are of course
aα−1
−1 identical.4 )
=C (xmin ak )−α+1 . (2) We know the value of the exponent α for our artifi-
α−1
cial data set since it was generated deliberately to have
Thus, so long as α > 1, the number of samples per bin a particular value, but in practical situations we would
goes down as k increases and the bins in the tail will have often like to estimate α from observed data. One way
more statistical noise than those that precede them. As to do this would be to fit the slope of the line in plots
we will see in the next section, most power-law distribu- like Figs. 3b, c or d, and this is the most commonly used
tions occurring in nature have 2 ≤ α ≤ 3, so noisy tails method. Unfortunately, it is known to introduce system-
are the norm. atic biases into the value of the exponent [20], so it should
Another, and in many ways a superior, method of plot- not be relied upon. For example, a least-squares fit of a
ting the data is to calculate a cumulative distribution straight line to Fig. 3b gives α = 2.26 ± 0.02, which is
function. Instead of plotting a simple histogram of the clearly incompatible with the known value of α = 2.5
data, we make a plot of the probability P (x) that x has from which the data were generated.
a value greater than or equal to x: An alternative, simple and reliable method for extract-
Z ∞ ing the exponent is to employ the formula
P (x) = p(x′ ) dx′ . (3) " n #−1
x X xi
α=1+n ln . (5)
The plot we get is no longer a simple representation of i=1
xmin
the distribution of the data, but it is useful nonetheless.
If the distribution follows a power law p(x) = Cx−α , then Here the quantities xi , i = 1 . . . n are the measured values
of x and xmin is again the minimum value of x. (As
Z ∞
−α C
P (x) = C x′ dx′ = x−(α−1) . (4)
x α−1
Thus the cumulative distribution function P (x) also fol- 4 See http://www.hpl.hp.com/research/idl/papers/ranking/
lows a power law, but with a different exponent α − 1, for a useful discussion of these and related points.
II Measuring power laws 5
discussed in the following section, in practical situations shows the cumulative distribution of the number of
xmin usually corresponds not to the smallest value of x citations received by a paper between publication
measured but to the smallest for which the power-law and June 1997.
behaviour holds.) An estimate of the expected statistical
error σ on (5) is given by (c) Web hits: The cumulative distribution of the
number of “hits” received by web sites (i.e., servers,
" n #−1 not pages) during a single day from a subset of the
√ X xi α−1
σ= n ln = √ . (6) users of the AOL Internet service. The site with
i=1
xmin n the most hits, by a long way, was yahoo.com. Af-
ter Adamic and Huberman [12].
The derivation of both these formulas is given in Ap-
pendix B. (d) Copies of books sold: The cumulative distribu-
Applying Eqs. (5) and (6) to our present data gives an tion of the total number of copies sold in Amer-
estimate of α = 2.500 ± 0.002 for the exponent, which ica of the 633 bestselling books that sold 2 million
agrees well with the known value of 2.5. or more copies between 1895 and 1965. The data
were compiled painstakingly over a period of sev-
eral decades by Alice Hackett, an editor at Pub-
A. Examples of power laws lisher’s Weekly [23]. The best selling book dur-
ing the period covered was Benjamin Spock’s The
In Fig. 4 we show cumulative distributions of twelve Common Sense Book of Baby and Child Care. (The
different quantities measured in physical, biological, tech- Bible, which certainly sold more copies, is not really
nological and social systems of various kinds. All have a single book, but exists in many different transla-
been proposed to follow power laws over some part of tions, versions and publications, and was excluded
their range. The ubiquity of power-law behaviour in the by Hackett from her statistics.) Substantially bet-
natural world has led many scientists to wonder whether ter data on book sales than Hackett’s are now avail-
there is a single, simple, underlying mechanism link- able from operations such as Nielsen BookScan, but
ing all these different systems together. Several candi- unfortunately at a price this author cannot afford.
dates for such mechanisms have been proposed, going by I should be very interested to see a plot of sales
names like “self-organized criticality” and “highly opti- figures from such a modern source.
mized tolerance”. However, the conventional wisdom is
that there are actually many different mechanisms for (e) Telephone calls: The cumulative distribution of
producing power laws and that different ones are appli- the number of calls received on a single day by 51
cable to different cases. We discuss these points further million users of AT&T long distance telephone ser-
in Section IV. vice in the United States. After Aiello et al. [24].
The distributions shown in Fig. 4 are as follows. The largest number of calls received by a customer
in that day was 375 746, or about 260 calls a minute
(a) Word frequency: Estoup [8] observed that the (obviously to a telephone number that has many
frequency with which words are used appears to fol- people manning the phones). Similar distributions
low a power law, and this observation was famously are seen for the number of calls placed by users and
examined in depth and confirmed by Zipf [2]. also for the numbers of email messages that people
Panel (a) of Fig. 4 shows the cumulative distribu- send and receive [25, 26].
tion of the number of times that words occur in a
typical piece of English text, in this case the text of (f) Magnitude of earthquakes: The cumulative dis-
the novel Moby Dick by Herman Melville.5 Similar tribution of the Richter (local) magnitude of earth-
distributions are seen for words in other languages. quakes occurring in California between January
1910 and May 1992, as recorded in the Berkeley
(b) Citations of scientific papers: As first observed Earthquake Catalog. The Richter magnitude is de-
by Price [11], the numbers of citations received by fined as the logarithm, base 10, of the maximum
scientific papers appear to have a power-law distri- amplitude of motion detected in the earthquake,
bution. The data in panel (b) are taken from the and hence the horizontal scale in the plot, which
Science Citation Index, as collated by Redner [22], is drawn as linear, is in effect a logarithmic scale
and are for papers published in 1981. The plot of amplitude. The power law relationship in the
earthquake distribution is thus a relationship be-
tween amplitude and frequency of occurrence. The
data are from the National Geophysical Data Cen-
5 The most common words in this case are, in order, “the”, “of”,
ter, www.ngdc.noaa.gov.
“and”, “a” and “to”, and the same is true for most written En-
glish texts. Interestingly, however, it is not true for spoken En- (g) Diameter of moon craters: The cumulative dis-
glish. The most common words in spoken English are, in order, tribution of the diameter of moon craters. Rather
“I”, “and”, “the”, “to” and “that” [21]. than measuring the (integer) number of craters of
6
6
4
(a) 10 (b) (c)
10
4
4 10
10
2
10 2
2 10
10
0 0 0
10 10 10
0 2 4 0 2 4 0 2 4
10 10 10 10 10 10 10 10 10
word frequency citations web hits
2 (g) 4
10 (h) 100 (i)
10
3
0 10
10
2 10
-2 10
10
1
-4 10
10 1
2 3 4 5
0.01 0.1 1 10 10 10 10 1 10 100
crater diameter in km peak intensity intensity
4
10 (l)
(j) 4
10 (k)
100
2
2
10 10
10
0 0
1 10 10
9 10 4 5 6 3 5 7
10 10 10 10 10 10 10 10
net worth in US dollars name frequency population of city
FIG. 4 Cumulative distributions or “rank/frequency plots” of twelve quantities reputed to follow power laws. The distributions
were computed as described in Appendix A. Data in the shaded regions were excluded from the calculations of the exponents
in Table I. Source references for the data are given in the text. (a) Numbers of occurrences of words in the novel Moby Dick
by Hermann Melville. (b) Numbers of citations to scientific papers published in 1981, from time of publication until June
1997. (c) Numbers of hits on web sites by 60 000 users of the America Online Internet service for the day of 1 December 1997.
(d) Numbers of copies of bestselling books sold in the US between 1895 and 1965. (e) Number of calls received by AT&T
telephone customers in the US for a single day. (f) Magnitude of earthquakes in California between January 1910 and May 1992.
Magnitude is proportional to the logarithm of the maximum amplitude of the earthquake, and hence the distribution obeys a
power law even though the horizontal axis is linear. (g) Diameter of craters on the moon. Vertical axis is measured per square
kilometre. (h) Peak gamma-ray intensity of solar flares in counts per second, measured from Earth orbit between February
1980 and November 1989. (i) Intensity of wars from 1816 to 1980, measured as battle deaths per 10 000 of the population of the
participating countries. (j) Aggregate net worth in dollars of the richest individuals in the US in October 2003. (k) Frequency
of occurrence of family names in the US in the year 1990. (l) Populations of US cities in the year 2000.
II Measuring power laws 7
a given size on the whole surface of the moon, the as well (for example in Japan [28]) but not in all
vertical axis is normalized to measure number of cases. Korean family names for instance appear to
craters per square kilometre, which is why the axis have an exponential distribution [29].
goes below 1, unlike the rest of the plots, since it is
entirely possible for there to be less than one crater (l) Populations of cities: Cumulative distribution
of a given size per square kilometre. After Neukum of the size of the human populations of US cities as
and Ivanov [4]. recorded by the US Census Bureau in 2000.
(h) Intensity of solar flares: The cumulative dis- Few real-world distributions follow a power law over
tribution of the peak gamma-ray intensity of their entire range, and in particular not for smaller val-
solar flares. The observations were made be- ues of the variable being measured. As pointed out in
tween 1980 and 1989 by the instrument known the previous section, for any positive value of the expo-
as the Hard X-Ray Burst Spectrometer aboard nent α the function p(x) = Cx−α diverges as x → 0. In
the Solar Maximum Mission satellite launched reality therefore, the distribution must deviate from the
in 1980. The spectrometer used a CsI scin- power-law form below some minimum value xmin . In our
tillation detector to measure gamma-rays from computer-generated example of the last section we sim-
solar flares and the horizontal axis in the fig- ply cut off the distribution altogether below xmin so that
ure is calibrated in terms of scintillation counts p(x) = 0 in this region, but most real-world examples
per second from this detector. The data are are not that abrupt. Figure 4 shows distributions with
from the NASA Goddard Space Flight Center, a variety of behaviours for small values of the variable
umbra.nascom.nasa.gov/smm/hxrbs.html. See measured; the straight-line power-law form asserts itself
also Lu and Hamilton [5]. only for the higher values. Thus one often hears it said
(i) Intensity of wars: The cumulative distribution that the distribution of such-and-such a quantity “has a
of the intensity of 119 wars from 1816 to 1980. In- power-law tail”.
tensity is defined by taking the number of battle Extracting a value for the exponent α from distribu-
deaths among all participant countries in a war, tions like these can be a little tricky, since it requires
dividing by the total combined populations of the us to make a judgement, sometimes imprecise, about the
countries and multiplying by 10 000. For instance, value xmin above which the distribution follows the power
the intensities of the First and Second World Wars law. Once this judgement is made, however, α can be
were 141.5 and 106.3 battle deaths per 10 000 re- calculated simply from Eq. (5).6 (Care must be taken to
spectively. The worst war of the period covered use the correct value of n in the formula; n is the number
was the small but horrifically destructive Paraguay- of samples that actually go into the calculation, exclud-
Bolivia war of 1932–1935 with an intensity of 382.4. ing those with values below xmin , not the overall total
The data are from Small and Singer [27]. See also number of samples.)
Roberts and Turcotte [7]. Table I lists the estimated exponents for each of the
distributions of Fig. 4, along with standard errors and
(j) Wealth of the richest people: The cumulative also the values of xmin used in the calculations. Note
distribution of the total wealth of the richest people that the quoted errors correspond only to the statistical
in the United States. Wealth is defined as aggre- sampling error in the estimation of α; they include no
gate net worth, i.e., total value in dollars at current estimate of any errors introduced by the fact that a single
market prices of all an individual’s holdings, minus power-law function may not be a good model for the data
their debts. For instance, when the data were com- in some cases or for variation of the estimates with the
piled in 2003, America’s richest person, William H. value chosen for xmin .
Gates III, had an aggregate net worth of $46 bil- In the author’s opinion, the identification of some of
lion, much of it in the form of stocks of the company the distributions in Fig. 4 as following power laws should
he founded, Microsoft Corporation. Note that net be considered unconfirmed. While the power law seems
worth doesn’t actually correspond to the amount of to be an excellent model for most of the data sets de-
money individuals could spend if they wanted to: picted, a tenable case could be made that the distribu-
if Bill Gates were to sell all his Microsoft stock, for tions of web hits and family names might have two differ-
instance, or otherwise divest himself of any signif- ent power-law regimes with slightly different exponents.7
icant portion of it, it would certainly depress the
stock price. The data are from Forbes magazine, 6
October 2003.
6 Sometimes the tail is also cut off because there is, for one reason
(k) Frequencies of family names: Cumulative dis- or another, a limit on the largest value that may occur. An
tribution of the frequency of occurrence in the US of example is the finite-size effects found in critical phenomena—
the 89 000 most common family names, as recorded see Section IV.E. In this case, Eq. (5) must be modified [20].
by the US Census Bureau in 1990. Similar distribu- 7 Significantly more tenuous claims to power-law behaviour for
tions are observed for names in some other cultures other quantities have appeared elsewhere in the literature, for
8 Power laws, Pareto distributions and Zipf’s law
4
minimum exponent 1000 10
(a) (b)
quantity xmin α
3
(a) frequency of use of words 1 2.20(1) 10
100
(b) number of citations to papers 100 3.04(2) 2
10
(c) number of hits on web sites 1 2.40(1)
10 1
(d) copies of books sold in the US 2 000 000 3.51(16) 10
(e) telephone calls received 10 2.22(1) 0
1 10
(f) magnitude of earthquakes 3.8 3.04(4) 0 2 4
0 100 200 300
10 10 10
(g) diameter of moon craters 0.01 3.14(5)
abundance number of addresses
(h) intensity of solar flares 200 1.83(2)
(i) intensity of wars 3 1.80(9)
(j) net worth of Americans $600m 2.09(4) (c)
4
(k) frequency of family names 10 000 1.94(1) 10
B. Distributions that do not follow a power law books, which spans about three orders of magni-
tude but seems to follow a stretched exponential.
b
Power-law distributions are, as we have seen, impres- A stretched exponential is curve of the form e−ax
sively ubiquitous, but they are not the only form of broad for some constants a, b.
distribution. Lest I give the impression that everything
interesting follows a power law, let me emphasize that (c) The distribution of the sizes of forest fires, which
there are quite a number of quantities with highly right- spans six orders of magnitude and could follow a
skewed distributions that nonetheless do not obey power power law but with an exponential cutoff.
laws. A few of them, shown in Fig. 5, are the following: This being an article about power laws, I will not discuss
(a) The abundance of North American bird species, further the possible explanations for these distributions,
which spans over five orders of magnitude but is but the scientist confronted with a new set of data having
probably distributed according to a log-normal. A a broad dynamic range and a highly skewed distribution
log-normally distributed quantity is one whose log- should certainly bear in mind that a power-law model is
arithm is normally distributed; see Section IV.G only one of several possibilities for fitting it.
and Ref. [32] for further discussions.
(b) The number of entries in people’s email address III. THE MATHEMATICS OF POWER LAWS
with α > 0. As we saw in Section II.A, there must be then the mean of those many means is itself also for-
some lowest value xmin at which the power law is obeyed, mally divergent, since it is simply equal to the mean we
and we consider only the statistics of x above this value. would calculate if all the repetitions were combined into
one large experiment. This implies that, while the mean
may take a relatively small value on any particular repe-
A. Normalization tition of the experiment, it must occasionally take a huge
value, in order that the overall mean diverge as the num-
The constant C in Eq. (7) is given by the normalization ber of repetitions does. Thus there must be very large
requirement that fluctuations in the value of the mean, and this is what
Z ∞ Z ∞ the divergence in Eq. (11) really implies. In effect, our
C h −α+1 i∞
1= p(x)dx = C x−α dx = x . calculations are telling us that the mean is not a well
xmin xmin 1−α xmin defined quantity, because it can vary enormously from
(8) one measurement to the next, and indeed can become
We see immediately that this only makes sense if α > arbitrarily large. The formal divergence of hxi is a signal
1, since otherwise the right-hand side of the equation that, while we can quote a figure for the average of the
would diverge: power laws with exponents less than unity samples we measure, that figure is not a reliable guide to
cannot be normalized and don’t normally occur in nature. the typical size of the samples in another instance of the
If α > 1 then Eq. (8) gives same experiment.
For α > 2 however, the mean is perfectly well defined,
C = (α − 1)xα−1
min , (9)
with a value given by Eq. (11) of
and the correct normalized expression for the power law
itself is α−1
hxi = xmin . (12)
−α α−2
α−1 x
p(x) = . (10) We can also calculate higher moments of the distribu-
xmin xmin
tion p(x). For instance, the second moment, the mean
Some distributions follow a power law for part of their square, is given by
range but are cut off at high values of x. That is, above
some value they deviate from the power law and fall off
2 C h −α+3 i∞
x = x . (13)
quickly towards zero. If this happens, then the distribu- 3−α xmin
tion may be normalizable no matter what the value of
the exponent α. Even so, exponents less than unity are This diverges if α ≤ 3. Thus power-law distributions in
rarely, if ever, seen. this range, which includes almost all of those in Table I,
have no meaningful mean square, and thus also no mean-
ingful variance or standard deviation. If α > 3, then the
B. Moments second moment is finite and well-defined, taking the value
x . (14)
tity x is given by α − 3 min
Z ∞ Z ∞
These results can easily be extended to show that in
hxi = xp(x) dx = C x−α+1 dx
xmin xmin general all moments hxm i exist for m < α − 1 and all
C h −α+2 i∞ higher moments diverge. The ones that do exist are given
= x . (11) by
2−α xmin
the quantity P (x) defined in Eq. (3): Thus, as long as α > 1, we find that hxmax i always in-
−α+1 creases as n becomes larger.10
∞
C x
Z
P (x) = p(x′ ) dx′ = x−α+1 = ,
x α−1 xmin
(16) D. Top-heavy distributions and the 80/20 rule
so long as α > 1. And the probability that a sample is
not greater than x is 1 − P (x). Thus the probability that Another interesting question is where the majority of
a particular sample we draw, sample i, will lie between the distribution of x lies. For any power law with expo-
x and x + dx and that all the others will be no greater nent α > 1, the median is well defined. That is, there is
than it is p(x) dx × [1 − P (x)]n−1 . Then there are n ways a point x1/2 that divides the distribution in half so that
to choose i, giving a total probability half the measured values of x lie above x1/2 and half lie
below. That point is given by
π(x) = np(x)[1 − P (x)]n−1 . (17) Z ∞ Z ∞
1
p(x) dx = 2 p(x) dx, (24)
Now we can calculate the mean value hxmax i of the x1/2 xmin
largest sample thus:
Z ∞ Z ∞ or
hxmax i = xπ(x)dx = n xp(x)[1−P (x)]n−1 dx. x1/2 = 21/(α−1) xmin . (25)
xmin xmin
(18)
Using Eqs. (10) and (16), this is So, for example, if we are considering the distribution
of wealth, there will be some well-defined median wealth
hxmax i = n(α − 1) × that divides the richer half of the population from the
Z ∞ −α+1 −α+1 n−1 poorer. But we can also ask how much of the wealth
x x itself lies in those two halves. Obviously more than half
1− dx
xmin xmin xmin of the total amount of money belongs to the richer half of
Z 1 the population. The fraction of the money in the richer
y n−1
= nxmin 1/(α−1)
dy half is given by
0 (1 − y)
R∞
= nxmin B n, (α − 2)/(α − 1) , (19) xp(x) dx x1/2 −α+2
x
R ∞1/2 = = 2−(α−2)/(α−1) , (26)
where I have made the substitution y = 1−(x/xmin)−α+1 x
xp(x) dx xmin
min
and B(a, b) is Legendre’s beta-function,8 which is defined
by provided α > 2 so that the integrals converge. Thus,
for instance, if α = 2.1 for the wealth distribution, as
Γ(a)Γ(b) indicated in Table I, then a fraction 2−0.091 ≃ 94% of the
B(a, b) = , (20) wealth is in the hands of the richer 50% of the population,
Γ(a + b)
making the distribution quite top-heavy.
with Γ(a) the standard Γ-function: More generally, the fraction of the population whose
Z ∞ personal wealth exceeds x is given by the quantity P (x),
Γ(a) = ta−1 e−t dt. (21) Eq. (16), and the fraction of the total wealth in the hands
0 of those people is
The beta-function has the interesting property that R∞ ′ ′ ′ −α+2
x x p(x ) dx
x
for large values of either of its arguments it itself fol- W (x) = R ∞ = , (27)
lows a power law.9 For instance, for large a and fixed b, x x′ p(x′ ) dx′ xmin
min
Setting x = 1 we find that the constant is simply ln p(1), If, as is usually the case, the power-law behaviour is seen
and then taking exponentials of both sides only in the tail of the distribution, for values k ≥ kmin ,
then the equivalent expression is
p(x) = p(1) x−α , (34)
k −α
pk = , (38)
′
where α = −p(1)/p (1). Thus, as advertised, the power- ζ(α, kmin )
law distribution is the only function satisfying the scale- P∞ −α
free criterion (29). where ζ(α, kmin ) = k=kmin k is the generalized or
This fact is more than just a curiosity. As we will incomplete ζ-function.
see in Section IV.E, there are some systems that become Most of the results of the previous sections can be gen-
scale-free for certain special values of their governing pa- eralized to the case of discrete variables, although the
rameters. The point defined by such a special value is mathematics is usually harder and often involves special
called a “continuous phase transition” and the argument functions in place of the more tractable integrals of the
given above implies that at such a point the observable continuous case.
quantities in the system should adopt a power-law dis- It has occasionally been proposed that Eq. (35) is not
tribution. This indeed is seen experimentally and the the best generalization of the power law to the discrete
distributions so generated provided the original motiva- case. An alternative and often more convenient form is
tion for the study of power laws in physics (although Γ(k)Γ(α)
most experimentally observed power laws are probably pk = C = C B(k, α), (39)
Γ(k + α)
not the result of phase transitions—a variety of other
mechanisms produce power-law behaviour as well, as we where B(a, b) is, as before, the Legendre beta-function,
will shortly see). Eq. (20). As mentioned in Section III.C, the beta-
function behaves as a power law B(k, α) ∼ k −α for large k
and so the distribution has the desired asymptotic form.
F. Power laws for discrete variables Simon [35] proposed that Eq. (39) be called the Yule dis-
tribution, after Udny Yule who derived it as the limiting
So far I have focused on power-law distributions for distribution in a certain stochastic process [36], and this
continuous real variables, but many of the quantities we name is often used today. Yule’s result is described in
deal with in practical situations are in fact discrete— Section IV.D.
usually integers. For instance, populations of cities, num- The Yule distribution is nice because sums involving it
bers of citations to papers or numbers of copies of books can frequently be performed in closed form, where sums
sold are all integer quantities. In most cases, the distinc- involving Eq. (35) can only be written in terms of special
tion is not very important. The power law is obeyed only functions. For instance, the normalizing constant C for
in the tail of the distribution where the values measured the Yule distribution is given by
are so large that, to all intents and purposes, they can be
∞
considered continuous. Technically however, power-law X C
distributions should be defined slightly differently for in- 1=C B(k, α) = , (40)
α−1
teger quantities. k=1
pk = Ck −α , (35) The first and second moments (i.e., the mean and mean
square of the distribution) are
for some constant exponent α. Clearly this distribution
cannot hold all the way down to k = 0, since it diverges α−1
2 (α − 1)2
hki = , k = , (42)
there, but it could in theory hold down to k = 1. If we α−2 (α − 2)(α − 3)
discard any data for k = 0, the constant C would then
be given by the normalization condition and there are similarly simple expressions corresponding
to many of our earlier results for the continuous case.
∞
X ∞
X
1= pk = C k −α = Cζ(α), (36)
k=1 k=1 IV. MECHANISMS FOR GENERATING POWER-LAW
DISTRIBUTIONS
where ζ(α) is the Riemann ζ-function. Rearranging, we
find that C = 1/ζ(α) and In this section we look at possible candidate mech-
anisms by which power-law distributions might arise in
k −α natural and man-made systems. Some of the possibilities
pk = . (37)
ζ(α) that have been suggested are quite complex—notably the
IV Mechanisms for generating power-law distributions 13
physics of critical phenomena and the tools of the renor- Thus, following our argument above, the distribution of
malization group that are used to analyse it. But let us frequencies of words has the form p(x) ∼ x−α with
start with some simple algebraic methods of generating
power-law functions and progress to the more involved a 2 ln m − ln(1 − qs )
α=1− = . (47)
mechanisms later. b ln m − ln(1 − qs )
For the typical case where m is reasonably large and qs
quite small this gives α ≃ 2 in approximate agreement
A. Combinations of exponentials with Table I.
This is a reasonable theory as far as it goes, but real
A much more common distribution than the power law text is not made up of random letters. Most combina-
is the exponential, which arises in many circumstances, tions of letters don’t occur in natural languages; most are
such as survival times for decaying atomic nuclei or the not even pronounceable. We might imagine that some
Boltzmann distribution of energies in statistical mechan- constant fraction of possible letter sequences of a given
ics. Suppose some quantity y has an exponential distri- length would correspond to real words and the argument
bution: above would then work just fine when applied to that
fraction, but upon reflection this suggestion is obviously
p(y) ∼ eay . (43) bogus. It is clear for instance that very long words sim-
ply don’t exist in most languages, although there are ex-
The constant a might be either negative or positive. If ponentially many possible combinations of letters avail-
it is positive then there must also be a cutoff on the able to make them up. This observation is backed up
distribution—a limit on the maximum value of y—so that by empirical data. In Fig. 7a we show a histogram of
the distribution is normalizable. the lengths of words occurring in the text of Moby Dick,
Now suppose that the real quantity we are interested in and one would need a particularly vivid imagination to
is not y but some other quantity x, which is exponentially convince oneself that this histogram follows anything like
related to y thus: the exponential assumed by Miller’s argument. (In fact,
the curve appears roughly to follow a log-normal [32].)
x ∼ eby , (44) There may still be some merit in Miller’s argument
however. The problem may be that we are measuring
with b another constant, also either positive or negative. word “length” in the wrong units. Letters are not really
Then the probability distribution of x is the basic units of language. Some basic units are letters,
but some are groups of letters. The letters “th” for ex-
dy eay x−1+a/b ample often occur together in English and make a single
p(x) = p(y) ∼ by = , (45)
dx be b sound, so perhaps they should be considered to be a sep-
arate symbol in their own right and contribute only one
which is a power law with exponent α = 1 − a/b.
unit to the word length?
A version of this mechanism was used by Miller [37] to
Following this idea to its logical conclusion we
explain the power-law distribution of the frequencies of
can imagine replacing each fundamental unit of the
words as follows (see also [38]). Suppose we type ran-
language—whatever that is—by its own symbol and then
domly on a typewriter,11 pressing the space bar with
measuring lengths in terms of numbers of symbols. The
probability qs per stroke and each letter with equal prob-
pursuit of ideas along these lines led Claude Shannon
ability ql per stroke. If there are m letters in the alpha-
in the 1940s to develop the field of information the-
bet then ql = (1 − qs )/m. (In this simplest version of the
ory, which gives a precise prescription for calculating the
argument we also type no punctuation, digits or other
number of symbols necessary to transmit words or any
non-letter symbols.) Then the frequency x with which
other data [39, 40]. The units of information are bits and
a particular word with y letters (followed by a space)
the true “length” of a word can be considered to be the
occurs is
number of bits of information it carries. Shannon showed
y that if we regard words as the basic divisions of a mes-
1 − qs
x= qs ∼ eby , (46) sage, the information y carried by any particular word
m
is
where b = ln(1 − qs ) − ln m. The number (or fraction) of y = −k ln x, (48)
distinct possible words with length between y and y + dy
goes up exponentially as p(y) ∼ my = eay with a = ln m. where x is the frequency of the word as before and k is
a constant. (The reader interested in finding out more
about where this simple relation comes from is recom-
mended to look at the excellent introduction to informa-
11 This argument is sometimes called the “monkeys with typewrit- tion theory by Cover and Thomas [41].)
ers” argument, the monkey being the traditional exemplar of a But this has precisely the form that we want. Inverting
random typist. it we have x = e−y/k and if the probability distribution of
14 Power laws, Pareto distributions and Zipf’s law
4 B. Inverses of quantities
10 (a) (b)
3
10
ative values. And suppose further that the quantity we
are really interested in is the reciprocal x = 1/y, which
10
2 will have distribution
dy p(y)
1 p(x) = p(y) =− 2 . (49)
10 dx x
FIG. 7 (a) Histogram of the lengths in letters of all distinct p(x) ∼ x−2 , (50)
words in the text of the novel Moby Dick. (b) Histogram of
the information content a la Shannon of words in Moby Dick. where the constant of proportionality is p(y = 0).
The former does not, by any stretch of the imagination, follow More generally, any quantity x = y −γ for some γ will
an exponential, but the latter could easily be said to do so. have a power-law tail to its distribution p(x) ∼ x−α , with
(Note that the vertical axes are logarithmic.)
α = 1+1/γ. It is not clear who the first author or authors
were to describe this mechanism,12 but clear descriptions
have been given recently by Bouchaud [44], Jan et al. [45]
and Sornette [46].
One might argue that this mechanism merely generates
a power law by assuming another one: the power-law re-
the “lengths” measured in terms of bits is also exponen- lationship between x and y generates a power-law distri-
tial as in Eq. (43) we will get our power-law distribution. bution for x. This is true, but the point is that the mecha-
Figure 7b shows the latter distribution, and indeed it nism takes some physical power-law relationship between
follows a nice exponential—much better than Fig. 7a. x and y—not a stochastic probability distribution—and
This is still not an entirely satisfactory explanation. from that generates a power-law probability distribution.
Having made the shift from pure word length to informa- This is a non-trivial result.
tion content, our simple count of the number of words of One circumstance in which this mechanism arises is
length y—that it goes exponentially as my —is no longer in measurements of the fractional change in a quantity.
valid, and now we need some reason why there should be For instance, Jan et al. [45] consider one of the most
exponentially more distinct words in the language of high famous systems in theoretical physics, the Ising model of
information content than of low. That this is the case is a magnet. In its paramagnetic phase, the Ising model has
experimentally verified by Fig. 7b, but the reason must a magnetization that fluctuates around zero. Suppose we
be considered still a matter of debate. Some possibilities measure the magnetization m at uniform intervals and
are discussed by, for instance, Mandelbrot [42] and more calculate the fractional change δ = (∆m)/m between
recently by Mitzenmacher [19]. each successive pair of measurements. The change ∆m
Another example of the “combination of exponentials” is roughly normally distributed and has a typical size set
mechanism has been discussed by Reed and Hughes [43]. by the width of that normal distribution. The 1/m on the
They consider a process in which a set of items, piles or other hand produces a power-law tail when small values
groups each grows exponentially in time, having size x ∼ of m coincide with large values of ∆m, so that the tail of
ebt with b > 0. For instance, populations of organisms the distribution of δ follows p(δ) ∼ δ −2 as above.
reproducing freely without resource constraints grow ex- In Fig. 8 I show a cumulative histogram of mea-
ponentially. Items also have some fixed probability of surements of δ for simulations of the Ising model on a
dying per unit time (populations might have a stochas- square lattice, and the power-law distribution is clearly
tically constant probability of extinction), so that the visible. Using Eq. (5), the value of the exponent is
times t at which they die are exponentially distributed α = 1.98 ± 0.04, in good agreement with the expected
p(t) ∼ eat with a < 0. value of 2.
These functions again follow the form of Eqs. (43)
and (44) and result in a power-law distribution of the
sizes x of the items or groups at the time they die. Reed
and Hughes suggest that variations on this argument may 12 A correspondent tells me that a similar mechanism was described
explain the sizes of biological taxa, incomes and cities, in an astrophysical context by Chandrasekhar in a paper in 1943,
among other things. but I have been unable to confirm this.
IV Mechanisms for generating power-law distributions 15
4
10 t = 2m t = 2n
position
3
10
t
2
10
and comparing this expression with Eq. (52), we imme- D. The Yule process
diately see that
One of the most convincing and widely applicable
2n
n mechanisms for generating power laws is the Yule pro-
f2n = , (59)
(2n − 1) 22n cess, whose invention was, coincidentally, also inspired
by observations of the statistics of biological taxa as dis-
and we have our solution for the distribution of first re- cussed in the previous section.
turn times. In addition to having a (possibly) power-law distribu-
Now consider the form of f2n for large n. Writing out tion of lifetimes, biological taxa also have a very convinc-
the binomial coefficient as 2n
n = (2n)!/(n!)2 , we take ing power-law distribution of sizes. That is, the distribu-
logs thus: tion of the number of species in a genus, family or other
taxonomic group appears to follow a power law quite
ln f2n = ln(2n)! − 2 ln n! − 2n ln 2 − ln(2n − 1), (60) closely. This phenomenon was first reported by Willis
1
and Yule in 1922 for the example of flowering plants [15].
and use Sterling’s formula ln n! ≃ n ln n − n + 2 ln n to Three years later, Yule [36] offered an explanation using
get ln f2n ≃ 12 ln 2 − 21 ln n − ln(2n − 1), or a simple model that has since found wide application in
s other areas. He argued as follows.
2 Suppose first that new species appear but they never
f2n ≃ . (61) die; species are only ever added to genera and never re-
n(2n − 1)2
moved. This differs from the random walk model of the
last section, and certainly from reality as well. It is be-
In the limit n → ∞, this implies that f2n ∼ n−3/2 , or
lieved that in practice all species and all genera become
equivalently
extinct in the end. But let us persevere; there is nonethe-
ft ∼ t−3/2 . (62) less much of worth in Yule’s simple model.
Species are added to genera by speciation, the splitting
So the distribution of return times follows a power law of one species into two, which is known to happen by a va-
with exponent α = 23 . Note that the distribution has a
divergent mean (because α ≤ 2). As discussed in Sec-
tion III.C, this implies that the mean is finite for any
finite sample but can take very different values for dif- 15 Modern phylogenetic analysis, the quantitative comparison of
ferent samples, so that the value measured for any one species’ genetic material, can provide a picture of the evolution-
sample gives little or no information about the value for ary tree and hence allow the accurate “cladistic” assignment of
species to taxa. For prehistoric species, however, whose genetic
any other. material is not usually available, determination of evolutionary
As an example application, the random walk can be ancestry is difficult, so classification into taxa is based instead
considered a simple model for the lifetime of biological on morphology, i.e., on the shapes of organisms. It is widely ac-
taxa. A taxon is a branch of the evolutionary tree, a knowledged that such classifications are subjective and that the
taxonomic assignments of fossil species are probably riddled with
errors.
16 To be fair, I consider the power law for the distribution of genus
lifetimes to fall in the category of “tenuous” identifications to
14 The enthusiastic reader can easily derive this result for him or which I alluded in footnote 7. This theory should be taken with
herself by expanding (1 − z)−1/2 using the binomial theorem. a pinch of salt.
IV Mechanisms for generating power-law distributions 17
riety of mechanisms, including competition for resources, genera with k species thus:
spatial separation of breeding populations and genetic
m
drift. If we assume that this happens at some stochasti- (n + 1)pk,n+1 = npk,n + (k − 1)pk−1,n − kpk,n .
cally constant rate, then it follows that a genus with k m+1
(64)
species in it will gain new species at a rate proportional
The only exception to this equation is for genera of size 1,
to k, since each of the k species has the same chance per
which instead obey the equation
unit time of dividing in two. Let us further suppose that
occasionally, say once every m speciation events, the new m
(n + 1)p1,n+1 = np1,n + 1 − p1,n , (65)
species produced is, by chance, sufficiently different from m+1
the others in its genus as to be considered the founder
member of an entire new genus. (To be clear, we define since by definition exactly one new such genus appears
m such that m species are added to pre-existing genera on each time step.
and then one species forms a new genus. So m + 1 new Now we ask what form the distribution of the sizes of
species appear for each new genus and there are m + 1 genera takes in the limit of long times. To do this we
species per genus on average.) Thus the number of gen- allow n → ∞ and assume that the distribution tends
era goes up steadily in this model, as does the number of to some fixed value pk = limn→∞ pn,k independent of n.
species within each genus. Then Eq. (65) becomes p1 = 1 − mp1 /(m + 1), which has
We can analyse this Yule process mathematically as the solution
follows.17 Let us measure the passage of time in the m+1
model by the number of genera n. At each time-step p1 = . (66)
2m + 1
one new species founds a new genus, thereby increasing
n by 1, and m other species are added to various pre- And Eq. (64) becomes
existing genera which are selected in proportion to the
number of species they already have. We denote by pk,n m
pk = (k − 1)pk−1 − kpk , (67)
the fraction of genera that have k species when the total m+1
number of genera is n. Thus the number of such genera which can be rearranged to read
is npk,n . We now ask what the probability is that the
next species added to the system happens to be added to k−1
a particular genus i having ki species in it already. This pk = pk−1 , (68)
k + 1 + 1/m
probability is proportional
P to ki , P and so when properly
normalized is just ki / i ki . But i ki is simply the to- and then iterated to get
tal number of species, which is n(m + 1). Furthermore,
between the appearance of the nth and the (n + 1)th (k − 1)(k − 2) . . . 1
pk = p1
genera, m other new species are added, so the probabil- (k + 1 + 1/m)(k + 1/m) . . . (3 + 1/m)
ity that genus i gains a new species during this interval is (k − 1) . . . 1
mki /(n(m + 1)). And the total expected number of gen- = (1 + 1/m) , (69)
(k + 1 + 1/m) . . . (2 + 1/m)
era of size k that gain a new species in the same interval
is where I have made use of Eq. (66). This can be simpli-
fied further by making use of a handy property of the
Γ-function, Eq. (21), that Γ(a) = (a − 1)Γ(a − 1). Using
mk m this, and noting that Γ(1) = 1, we get
× npk,n = kpk,n . (63)
n(m + 1) m+1
Γ(k)Γ(2 + 1/m)
pk = (1 + 1/m)
Γ(k + 2 + 1/m)
Now we observe that the number of genera with k = (1 + 1/m)B(k, 2 + 1/m), (70)
species will decrease on each time step by exactly this
number, since by gaining a new species they become gen- where B(a, b) is again the beta-function, Eq. (20). This,
era with k + 1 instead. At the same time the number we note, is precisely the distribution defined in Eq. (39),
increases because of species that previously had k − 1 which Simon called the Yule distribution. Since the beta-
species and now have an extra one. Thus we can write function has a power-law tail B(a, b) ∼ a−b , we can im-
a master equation for the new number (n + 1)pk,n+1 of mediately see that pk also has a power-law tail with an
exponent
1
α = 2+ . (71)
17 Yule’s analysis of the process was considerably more involved m
than the one presented here, essentially because the theory of
stochastic processes as we now know it did not yet exist in his The mean number m + 1 of species per genus for the
time. The master equation method we employ is a relatively example of flowering plants is about 3, making m ≃ 2
modern innovation, introduced in this context by Simon [35]. and α ≃ 2.5. The actual exponent for the distribution
18 Power laws, Pareto distributions and Zipf’s law
found by Willis and Yule [15] is α = 2.5 ± 0.1, which is zero citations for instance.
in excellent agreement with the theory. In between the appearance of one object and the next,
Most likely this agreement is fortuitous, however. The m new species/people/citations etc. are added to the en-
Yule process is probably not a terribly realistic expla- tire system. That is some cities or papers will get new
nation for the distribution of the sizes of genera, princi- people or citations, but not necessarily all will. And in
pally because it ignores the fact that species (and gen- the simplest case these are added to objects in propor-
era) become extinct. However, it has been adapted and tion to the number that the object already has. Thus
generalized by others to explain power laws in many the probability of a city gaining a new member is pro-
other systems, most famously city sizes [35], paper ci- portional to the number already there; the probability
tations [50, 51], and links to pages on the world wide of a paper getting a new citation is proportional to the
web [52, 53]. The most general form of the Yule process number it already has. In many cases this seems like a
is as follows. natural process. For example, a paper that already has
Suppose we have a system composed of a collection of many citations is more likely to be discovered during a
objects, such as genera, cities, papers, web pages and so literature search and hence more likely to be cited again.
forth. New objects appear every once in a while as cities Simon [35] dubbed this type of “rich-get-richer” process
grow up or people publish new papers. Each object also the Gibrat principle. Elsewhere it also goes by the names
has some property k associated with it, such as number of of the Matthew effect [54], cumulative advantage [50], or
species in a genus, people in a city or citations to a paper, preferential attachment [52].
that is reputed to obey a power law, and it is this power There is a problem however when k0 = 0. For example,
law that we wish to explain. Newly appearing objects if new papers appear with no citations and garner cita-
have some initial value of k which we will denote k0 . tions in proportion to the number they currently have,
New genera initially have only a single species k0 = 1, which is zero, then no paper will ever get any citations!
but new towns or cities might have quite a large initial To overcome this problem one typically assigns new cita-
population—a single person living in a house somewhere tions not in proportion simply to k, but to k + c, where
is unlikely to constitute a town in their own right but c is some constant. Thus there are three parameters k0 ,
k0 = 100 people might do so. The value of k0 can also be c and m that control the behaviour of the model.
zero in some cases: newly published papers usually have
By an argument exactly analogous to the one given above, one can then derive the master equation
k−1+c k+c
(n + 1)pk,n+1 = npk,n + m pk−1,n − m pk,n , for k > k0 , (72)
k0 + c + m k0 + c + m
and
k0 + c
(n + 1)pk0 ,n+1 = npk0 ,n + 1 − m pk ,n , for k = k0 . (73)
k0 + c + m 0
(Note that k is never less than k0 , since each object appears with k = k0 initially.)
FIG. 11 Three examples of percolation systems on 100 × 100 square lattices with p = 0.3, p = pc = 0.5927 . . . and p = 0.9. The
first and last are well below and above the critical point respectively, while the middle example is precisely at it.
illustrate this phenomenon, I show in Fig. 12 a plot of dimensionless ratios we can form: s/a, a/ hsi and s/ hsi
hsi from simulations of the percolation model and the (or their reciprocals, if we prefer). Only two of these are
divergence is clear. independent however, since the last is the product of the
Now consider not just the mean cluster size but the en- other two. Thus in general we can write
tire distribution of cluster sizes. Let p(s) be the probabil-
ity that a randomly chosen square belongs to a cluster of s a
p(s) = Cf , , (78)
area s. In general, what forms can p(s) take as a function a hsi
of s? The important point to notice is that p(s), being
a probability distribution, is a dimensionless quantity— where f is a dimensionless mathematical function of its
just a number—but s is an area. We could measure s in dimensionless arguments
P and C is a normalizing constant
terms of square metres, or whatever units the lattice is chosen so that s p(s) = 1.
calibrated in. The average hsi is also an area and then But now here’s the trick. We can coarse-grain or
there is the area of a unit square itself, which we will de- rescale our lattice so that the fundamental unit of the
note a. Other than these three quantities, however, there lattice changes. For instance, we could double the size of
are no other independent parameters with dimensions in our unit square a. The kind of picture I’m thinking of
this problem. (There is the area of the whole lattice, but is shown in Fig. 13. The basic percolation clusters stay
we are considering the limit where that becomes infinite, roughly the same size and shape, although I’ve had to
so it’s out of the picture.) fudge things around the edges a bit to make it work. For
If we want to make a dimensionless function p(s) out this reason this argument will only be strictly correct for
of these three dimensionful parameters, there are three large clusters s whose area is not changed appreciably by
the fudging. (And the argument thus only tells us that
the tail of the distribution is a power law, and not the
200 whole distribution.)
150
mean cluster size
100
50
0
0 0.2 0.4 0.6 0.8 1
FIG. 13 A site percolation system is coarse-grained, so that
percolation probability the area of the fundamental square is (in this case) quadru-
pled. The occupation of the squares in the coarse-grained
lattice (right) is chosen to mirror as nearly as possible that of
FIG. 12 The mean area of the cluster to which a randomly the squares on the original lattice (left), so that the sizes and
chosen square belongs for the percolation model described in shapes of the large clusters remain roughly the same. The
the text, calculated from an average over 1000 simulations on small clusters are mostly lost in the coarse-graining, so that
a 1000×1000 square lattice. The dotted line marks the known the arguments given in the text are valid only for the large-s
position of the phase transition. tail of the cluster size distribution.
IV Mechanisms for generating power-law distributions 21
8
The probability p(s) of getting a cluster of area s is 10
unchanged by the coarse-graining since the areas them-
selves are, to a good approximation, unchanged, and the
where g(b) = C/C ′ . Comparing with Eq. (29) we see that F. Self-organized criticality
this has precisely the form of the equation that defines a
scale-free distribution. The rest of the derivation below As discussed in the preceding section, certain sys-
Eq. (29) follows immediately, and so we know that p(s) tems develop power-law distributions at special “critical”
must follow a power law. points in their parameter space because of the divergence
This in fact is the origin of the name “scale-free” for a of some characteristic scale, such as the mean cluster size
distribution of the form (29). At the point at which hsi in the percolation model. This does not, however, pro-
diverges, the system is left with no defining size-scale, vide a plausible explanation for the origin of power laws
other than the unit of area a itself. It is “scale-free”, and in most real systems. Even if we could come up with some
by the argument above it follows that the distribution of model of earthquakes or solar flares or web hits that had
s must obey a power law. such a divergence, it seems unlikely that the parameters
In Fig. 14 I show an example of a cumulative distribu- of the real world would, just coincidentally, fall precisely
tion of cluster sizes for a percolation system right at the at the point where the divergence occurred.
critical point and, as the figure shows, the distribution As first proposed by Bak et al. [57], however, it is possi-
does indeed follow a power law. Technically the distribu- ble that some dynamical systems actually arrange them-
tion cannot follow a power law to arbitrarily large cluster selves so that they always sit at the critical point, no
sizes since the area of a cluster can be no bigger than the matter what state we start off in. One says that such
area of the whole lattice, so the power-law distribution systems self-organize to the critical point, or that they
will be cut off in the tail. This is an example of a finite- display self-organized criticality. A now-classic example
size effect. This point does not seem to be visible in of such a system is the forest fire model of Drossel and
Fig. 14 however. Schwabl [58], which is based on the percolation model we
The kinds of arguments given in this section can be have already seen.
made more precise using the machinery of the renor- Consider the percolation model as a primitive model
malization group. The real-space renormalization group of a forest. The lattice represents the landscape and a
makes use precisely of transformations such as that single tree can grow in each square. Occupied squares
22 Power laws, Pareto distributions and Zipf’s law
1 10 100 1000
represent trees and empty squares represent empty plots
of land with no trees. Trees appear instantaneously at size of fire s
random at some constant rate and hence the squares of
the lattice fill up at random. Every once in a while a FIG. 16 Cumulative distribution of the sizes of “fires” in a
wildfire starts at a random square on the lattice, set off simulation of the forest fire model of Drossel and Schwabl [58]
by a lightning strike perhaps, and burns the tree in that for a square lattice of size 5000 × 5000.
square, if there is one, along with every other tree in
the cluster connected to it. The process is illustrated in
Fig. 15. One can think of the fire as leaping from tree
to adjacent tree until the whole cluster is burned, but
the fire cannot cross the firebreak formed by an empty
square. If there is no tree in the square struck by the
lightning, then nothing happens. After a fire, trees can
grow up again in the squares vacated by burnt trees, so it follows a power law closely. The exponent of the dis-
the process keeps going indefinitely. tribution is quite small in this case. The best current
If we start with an empty lattice, trees will start to ap- estimates give a value of α = 1.19 ± 0.01 [59], meaning
pear but will initially be sparse and lightning strikes will that the distribution has an infinite mean in the limit of
either hit empty squares or if they do chance upon a tree large system size. For all real systems however the mean
they will burn it and its cluster, but that cluster will be is finite: the distribution is cut off in the large-size tail be-
small and localized because we are well below the perco- cause fires cannot have a size any greater than that of the
lation threshold. Thus fires will have essentially no effect lattice as a whole and this makes the mean well-behaved.
on the forest. As time goes by however, more and more This cutoff is clearly visible in Fig. 16 as the drop in the
trees will grow up until at some point there are enough curve towards the right of the plot. What’s more the dis-
that we have percolation. At that point, as we have seen, tribution of the sizes of fires in real forests, Fig. 5d, shows
a spanning cluster forms whose size is limited only by the a similar cutoff and is in many ways qualitatively similar
size of the lattice, and when any tree in that cluster gets to the distribution predicted by the model. (Real forests
hit by the lightning the entire cluster will burn away. are obviously vastly more complex than the forest fire
This gets rid of the spanning cluster so that the system model, and no one is seriously suggesting that the model
does not percolate any more, but over time as more trees is an accurate representation the real world. Rather it
appear it will presumably reach percolation again, and so is a guide to the general type of processes that might be
the scenario will play out repeatedly. The end result is going on in forests.)
that the system oscillates right around the critical point, There has been much excitement about self-organized
first going just above the percolation threshold as trees criticality as a possible generic mechanism for explaining
appear and then being beaten back below it by fire. In where power-law distributions come from. Per Bak, one
the limit of large system size these fluctuations become of the originators of the idea, wrote an entire book about
small compared to the size of the system as a whole and it [60]. Self-organized critical models have been put for-
to an excellent approximation the system just sits at the ward not only for forest fires, but for earthquakes [61, 62],
threshold indefinitely. Thus, if we wait long enough, we solar flares [5], biological evolution [63], avalanches [57]
expect the forest fire model to self-organize to a state and many other phenomena. Although it is probably not
in which it has a power-law distribution of the sizes of the universal law that some have claimed it to be, it is cer-
clusters, or of the sizes of fires. tainly a powerful and intriguing concept that potentially
In Fig. 16 I show the cumulative distribution of the has applications to a variety of natural and man-made
sizes of fires in the forest fire model and, as we can see, systems.
IV Mechanisms for generating power-law distributions 23
G. Other mechanisms for generating power laws be used to model avalanches and earthquakes.
One of the broad distributions mentioned in Sec. II.B
In the preceding sections I’ve described the best as an alternative to the power law was the log-normal. A
known and most widely applied mechanisms that gener- log-normally distributed quantity is one whose logarithm
ate power-law distributions. However, there are a num- is normally distributed. That is
ber of others that deserve a mention. One that has been
(ln x − µ)2
receiving some attention recently is the highly optimized p(ln x) ∼ exp − , (81)
tolerance mechanism of Carlson and Doyle [64, 65]. The 2σ 2
classic example of this mechanism is again a model of
forest fires and is based on the percolation process. Sup- for some choice of the mean µ and standard deviation σ
pose again that fires start at random in a grid-like forest, of the distribution. Distributions like this typically arise
just as we considered in Sec. IV.F, but suppose now that when we are multiplying together random numbers. The
instead of appearing at random, trees are deliberately log of the product of a large number of random numbers is
planted by a knowledgeable forester. One can ask what the sum of the logarithms of those same random numbers,
the best distribution of trees is to optimize the amount of and by the central limit theorem such sums have a normal
lumber the forest produces, subject to random fires that distribution essentially regardless of the distribution of
could start at any place. The answer turns out to be that the individual numbers.
one should plant trees in blocks, with narrow firebreaks But Eq. (81) implies that the distribution of x itself is
between them to prevent fires from spreading. Moreover,
(ln x − µ)2
one should make the blocks smaller in regions where fires d ln x 1
p(x) = p(ln x) = exp − . (82)
start more often and larger where fires are rare. The dx x 2σ 2
reason for this is that we waste some valuable space by
making firebreaks, space in which we could have planted To see how this looks if we were to plot it on log scales,
more trees. If fires are rare, then on average it pays to put we take logarithms of both sides, giving
the breaks further apart—more trees will burn if there is (ln x − µ)2
a fire, but we also get more lumber if there isn’t. ln p(x) = − ln x − 2
Carlson and Doyle show both by analytic arguments 2
2σ
µ2
and by numerical simulation that for quite general dis- (ln x) µ
= − + − 1 ln x − , (83)
tributions of starting points for fires this process leads to 2σ 2 σ2 2σ 2
a distribution of fire sizes that approximately follows a
power law. The distribution is not a perfect power law which is quadratic in ln x. However, any quadratic curve
in this case, but on the other hand neither are many of looks straight if we view a sufficient small portion of it, so
those seen in the data of Fig. 4, so this is not necessarily p(x) will look like a power-law distribution when we look
a disadvantage. Carlson and Doyle have proposed that at a small portion on log scales. The effective exponent α
highly optimized tolerance could be a model not only for of the distribution is in this case not fixed by the theory—
forest fires but also for the sizes of files on the world wide it could be anything, depending on which part of the
web, which appear to follow a power law [6]. quadratic our data fall on.
Another mechanism, which is mathematically similar On larger scales the distribution will have some down-
to that of Carlson and Doyle but quite different in mo- ward curvature, but so do many of the distributions
tivation, is the coherent noise mechanism proposed by claimed to follow power laws, so it is possible that these
Sneppen and Newman [66] as a model of biological ex- distributions are really log-normal. In fact, in many cases
tinction. In this mechanism a number of agents or species we don’t even have to restrict ourselves to a particu-
are subjected to stresses of various sizes, and each agent larly small a portion of the curve. If σ is large then the
has a threshold for stress above which an applied stress quadratic term in Eq. (83) will vary slowly and the cur-
will wipe that agent out—the species becomes extinct. vature of the line will be slight, so the distribution will
Extinct species are replaced by new ones with randomly appear to follow a power law over relatively large por-
chosen thresholds. The net result is that the system self- tions of its range. This situation arises commonly when
organizes to a state where most of the surviving species we are considering products of random numbers.
have high thresholds, but the exact distribution depends Suppose for example that we are multiplying together
on the distribution of stresses in a way very similar to the 100 numbers, each of which is drawn from some distri-
relation between block sizes and fire frequency in highly bution such that the standard deviation of the logs is
optimized tolerance. No conscious optimization is needed around 1—i.e., the numbers themselves vary up or down
in this case, but the end result is similar: the overall dis- by about a factor of e. Then, by the central limit the-
tribution of the numbers of species becoming extinct as orem, the standard deviation for ln x will be σ ≃ 10
a result of any particular stress approximately follows a and ln x will have to vary by about ±10 for changes in
power law. The power-law form is not exact, but it’s as (ln x)2 /σ 2 to be apparent. But such a variation in the
good as that seen in real extinction data. Sneppen and logarithm corresponds to a variation in x of more than
Newman have also suggested that their mechanism could four orders of magnitude. If our data span a domain
smaller than this, as many of the plots in Fig. 4 do, then
24 Power laws, Pareto distributions and Zipf’s law
we will see a measured distribution that looks close to this is precisely the exponent observed for the distribu-
power-law. And the range will get quickly larger as the tion of waiting times for aftershocks of earthquakes. The
number of numbers we are multiplying grows. record dynamics has also been proposed as a model for
One example of a random multiplicative process might the lifetimes of biological taxa [71].
be wealth generation by investment. If a person invests
money, for instance in the stock market, they will get
a percentage return on their investment that varies over
V. CONCLUSIONS
time. In other words, in each period of time their in-
vestment is multiplied by some factor which fluctuates
from one period to the next. If the fluctuations are ran- In this review I have discussed the power-law statis-
dom and uncorrelated, then after many such periods the tical distributions seen in a wide variety of natural and
value of the investment is the initial value multiplied by man-made phenomena, from earthquakes and solar flares
the product of a large number of random numbers, and to populations of cities and sales of books. We have seen
therefore should be distributed according to a log-normal. many examples of power-law distributions in real data
This could explain why the tail of the wealth distribution, and seen how to analyse those data to understand the be-
Fig. 4j, appears to follow a power law. haviour and parameters of the distributions. I have also
Another example is fragmentation. Suppose we break described a number of physical mechanisms that have
a stick of unit length into two parts at a position which is been proposed to explain the occurrence of power laws.
a random fraction z of the way along the stick’s length. Perhaps the two most important of these are:
Then we break the resulting pieces at random again and
so on. After many breaks, 1. The Yule process, a rich-get-richer mechanism in
Q the length of one of the re- which the most populous cities or best-selling books
maining pieces will be i zi , where zi is the position of
the ith break. This is a product of random numbers and get more inhabitants or sales in proportion to the
thus the resulting distribution of lengths should follow a number they already have. Yule and later Simon
power law over a portion of its range. A mechanism like showed mathematically that this mechanism pro-
this could, for instance, produce a power-law distribution duces what is now called the Yule distribution,
of meteors or other interplanetary rock fragments, which which follows a power law in its tail.
tend to break up when they collide with one another, and
this in turn could produce a power-law distribution of the 2. Critical phenomena and the associated concept of
sizes of meteor craters similar to the one in Fig. 4g. self-organized criticality, in which a scale-factor of a
In fact, as discussed by a number of authors [67, 68, system diverges, either because we have tuned the
69], random multiplication processes can also generate system to a special critical point in its parameter
perfect power-law distributions with only a slight modi- space or because the system automatically drives it-
fication: if there is a lower bound on the value that the self to that point by some dynamical process. The
product of a set of numbers is allowed to take (for ex- divergence can leave the system with no appropri-
ample if there is a “reflecting boundary” on the lower ate scale factor to set the size of some measured
end of the range, or an additive noise term as well as a quantity and as we have seen the quantity must
multiplicative one) then the behaviour of the process is then follow a power law.
modified to generate not a log-normal, but a true power
law. The study of power-law distributions is an area in
Finally, some processes show power-law distributions which there is considerable current research interest.
of times between events. The distribution of times be- While the mechanisms and explanations presented here
tween earthquakes and their aftershocks is one exam- certainly offer some insight, there is much work to be
ple. Such power-law distributions of times are observed done both experimentally and theoretically before we can
in critical models and in the coherent noise mechanism say we really understand the physical processes driving
mentioned above, but another possible explanation for these systems. Without doubt there are many exciting
their occurrence is a random extremal process or record discoveries still waiting to be made.
dynamics. In this mechanism we consider how often a
randomly fluctuating quantity will break its own record
for the highest value recorded. For a quantity with, say, a
Acknowledgements
Gaussian distribution, it is always in theory possible for
the record to be broken, no matter what its current value,
The author thanks Jean-Philippe Bouchaud, Petter
but the more often the record is broken the higher the
record will get and the longer we will have to wait until it Holme, Cris Moore, Cosma Shalizi, Eduardo Sontag,
Didier Sornette, and Erik van Nimwegen for useful con-
is broken again. As shown by Sibani and Littlewood [70],
this non-stationary process gives a distribution of wait- versations and suggestions, and Lada Adamic for the
web site hit data. This work was funded in part by the
ing times between the establishment of new records that
follows a power law with exponent α = 1. Interestingly, National Science Foundation under grant number DMS–
0405348.
B Maximum likelihood estimate of exponents 25
e−b b−2−n (n + 1 + b)Γ(n + 1) Gaither and D. A. Reed (eds.), Proceedings of the 1996
=
e−b b−1−n Γ(n + 1) ACM SIGMETRICS Conference on Measurement and
Modeling of Computer Systems, pp. 148–159, Association
n+1+b
= (B8) of Computing Machinery, New York (1996).
b
[7] D. C. Roberts and D. L. Turcotte, Fractality and self-
and organized criticality of wars. Fractals 6, 351–357 (1998).
R ∞ −bα [8] J. B. Estoup, Gammes Stenographiques. Institut
2 1R
e (α − 1)n α2 dα Stenographique de France, Paris (1916).
α = ∞ −bα
1
e (α − 1)n dα [9] D. H. Zanette and S. C. Manrubia, Vertical transmission
e−b b−3−n (n2 + 3n + b2 + 2b + 2nb + 2)Γ(n + 1) of culture and the distribution of family names. Physica
= A 295, 1–8 (2001).
e−b b−1−n Γ(n + 1)
[10] A. J. Lotka, The frequency distribution of scientific pro-
n2 + 3n + b2 + 2b + 2nb + 2 duction. J. Wash. Acad. Sci. 16, 317–323 (1926).
= , (B9)
b2 [11] D. J. de S. Price, Networks of scientific papers. Science
149, 510–515 (1965).
where Γ(x) is the Γ-function of Eq. (21). Then the vari-
ance of α is [12] L. A. Adamic and B. A. Huberman, The nature of mar-
kets in the World Wide Web. Quarterly Journal of Elec-
σ 2 = α2 − hαi
2 tronic Commerce 1, 512 (2000).
[13] R. A. K. Cox, J. M. Felton, and K. C. Chung, The con-
n2 + 3n + b2 + 2b + 2nb + 2 (n + 1 + b)2 centration of commercial success in popular music: an
= −
b2 b2 analysis of the distribution of gold records. Journal of
n+1 Cultural Economics 19, 333–340 (1995).
= , (B10)
b2 [14] R. Kohli and R. Sah, Market shares: Some power law
results and observations. Working paper 04.01, Harris
and the error on α is School of Public Policy, University of Chicago (2003).
√ −1 [15] J. C. Willis and G. U. Yule, Some statistics of evolution
n+1 √
X
xi
σ= = n+1 ln . (B11) and geographical distribution in plants and animals, and
b i
xmin their significance. Nature 109, 177–179 (1922).
[16] V. Pareto, Cours d’Economie Politique. Droz, Geneva
In most cases we will have n ≫ 1 and it is safe to ap- (1896).
proximate n + 1 by n, giving
[17] G. B. West, J. H. Brown, and B. J. Enquist, A general
−1 model for the origin of allometric scaling laws in biology.
√
X
xi α−1 Science 276, 122–126 (1997).
σ= n ln = √ , (B12)
i
xmin n [18] D. Sornette, Critical Phenomena in Natural Sciences,
chapter 14. Springer, Heidelberg, 2nd edition (2003).
where α in this expression is the maximum likelihood [19] M. Mitzenmacher, A brief history of generative mod-
estimate from Eq. (B6). els for power law and lognormal distributions. Internet
Mathematics 1, 226–251 (2004).
[20] M. L. Goldstein, S. A. Morris, and G. G. Yen, Problems
References
with fitting to the power-law distribution. Eur. Phys. J.
B 41, 255–258 (2004).
[1] F. Auerbach, Das Gesetz der Bevölkerungskonzentration.
Petermanns Geographische Mitteilungen 59, 74–76 [21] H. Dahl, Word Frequencies of Spoken American English.
(1913). Verbatim, Essex, CT (1979).
[2] G. K. Zipf, Human Behaviour and the Principle of Least [22] S. Redner, How popular is your paper? An empirical
Effort. Addison-Wesley, Reading, MA (1949). study of the citation distribution. Eur. Phys. J. B 4,
131–134 (1998).
[3] B. Gutenberg and R. F. Richter, Frequency of earth-
quakes in california. Bulletin of the Seismological Society [23] A. P. Hackett, 70 Years of Best Sellers, 1895-1965. R. R.
of America 34, 185–188 (1944). Bowker Company, New York, NY (1967).
[4] G. Neukum and B. A. Ivanov, Crater size distributions [24] W. Aiello, F. Chung, and L. Lu, A random graph model
and impact probabilities on Earth from lunar, terrestial- for massive graphs. In Proceedings of the 32nd Annual
planet, and asteroid cratering data. In T. Gehrels (ed.), ACM Symposium on Theory of Computing, pp. 171–180,
Hazards Due to Comets and Asteroids, pp. 359–416, Uni- Association of Computing Machinery, New York (2000).
versity of Arizona Press, Tucson, AZ (1994). [25] H. Ebel, L.-I. Mielsch, and S. Bornholdt, Scale-free topol-
[5] E. T. Lu and R. J. Hamilton, Avalanches of the distri- ogy of e-mail networks. Phys. Rev. E 66, 035103 (2002).
bution of solar flares. Astrophysical Journal 380, 89–92 [26] B. A. Huberman and L. A. Adamic, Information dynam-
(1991). ics in the networked world. In E. Ben-Naim, H. Frauen-
[6] M. E. Crovella and A. Bestavros, Self-similarity in World felder, and Z. Toroczkai (eds.), Complex Networks, num-
Wide Web traffic: Evidence and possible causes. In B. E. ber 650 in Lecture Notes in Physics, pp. 371–398,
B Maximum likelihood estimate of exponents 27
Springer, Berlin (2004). [46] D. Sornette, Mechanism for powerlaws without self-
[27] M. Small and J. D. Singer, Resort to Arms: International organization. Int. J. Mod. Phys. C 13, 133–136 (2001).
and Civil Wars, 1816-1980. Sage Publications, Beverley [47] R. H. Swendsen and J.-S. Wang, Nonuniversal critical
Hills (1982). dynamics in Monte Carlo simulations. Phys. Rev. Lett.
[28] S. Miyazima, Y. Lee, T. Nagamine, and H. Miyajima, 58, 86–88 (1987).
Power-law distribution of family names in Japanese soci- [48] K. Sneppen, P. Bak, H. Flyvbjerg, and M. H. Jensen,
eties. Physica A 278, 282–288 (2000). Evolution as a self-organized critical phenomenon. Proc.
[29] B. J. Kim and S. M. Park, Distribution of Korean family Natl. Acad. Sci. USA 92, 5209–5213 (1995).
names. Preprint cond-mat/0407311 (2004). [49] M. E. J. Newman and R. G. Palmer, Modeling Extinction.
[30] J. Chen, J. S. Thorp, and M. Parashar, Analysis of elec- Oxford University Press, Oxford (2003).
tric power disturbance data. In 34th Hawaii International [50] D. J. de S. Price, A general theory of bibliometric and
Conference on System Sciences, IEEE Computer Society other cumulative advantage processes. J. Amer. Soc. In-
(2001). form. Sci. 27, 292–306 (1976).
[31] B. A. Carreras, D. E. Newman, I. Dobson, and A. B. [51] P. L. Krapivsky, S. Redner, and F. Leyvraz, Connectivity
Poole, Evidence for self-organized criticality in electric of growing random networks. Phys. Rev. Lett. 85, 4629–
power system blackouts. In 34th Hawaii International 4632 (2000).
Conference on System Sciences, IEEE Computer Soci- [52] A.-L. Barabási and R. Albert, Emergence of scaling in
ety (2001). random networks. Science 286, 509–512 (1999).
[32] E. Limpert, W. A. Stahel, and M. Abbt, Log-normal dis- [53] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin,
tributions across the sciences: Keys and clues. Bioscience Structure of growing networks with preferential linking.
51, 341–352 (2001). Phys. Rev. Lett. 85, 4633–4636 (2000).
[33] M. E. J. Newman, S. Forrest, and J. Balthrop, Email [54] R. K. Merton, The Matthew effect in science. Science
networks and the spread of computer viruses. Phys. Rev. 159, 56–63 (1968).
E 66, 035101 (2002).
[55] P. J. Reynolds, W. Klein, and H. E. Stanley, A real-space
[34] M. O. Lorenz, Methods of measuring the concentration renormalization group for site and bond percolation. J.
of wealth. Publications of the American Statisical Asso- Phys. C 10, L167–L172 (1977).
ciation 9, 209–219 (1905).
[56] K. G. Wilson and J. Kogut, The renormalization group
[35] H. A. Simon, On a class of skew distribution functions. and the ǫ-expansion. Physics Reports 12, 75–199 (1974).
Biometrika 42, 425–440 (1955).
[57] P. Bak, C. Tang, and K. Wiesenfeld, Self-organized crit-
[36] G. U. Yule, A mathematical theory of evolution based on icality: An explanation of the 1/f noise. Phys. Rev. Lett.
the conclusions of Dr. J. C. Willis. Philos. Trans. R. Soc. 59, 381–384 (1987).
London B 213, 21–87 (1925).
[58] B. Drossel and F. Schwabl, Self-organized critical forest-
[37] G. A. Miller, Some effects of intermittent silence. Amer- fire model. Phys. Rev. Lett. 69, 1629–1632 (1992).
ican Journal of Psychology 70, 311–314 (1957).
[59] P. Grassberger, Critical behaviour of the drossel-schwabl
[38] W. Li, Random texts exhibit Zipf’s-law-like word fre- forest fire model. New Journal of Physics 4, 17 (2002).
quency distribution. IEEE Transactions on Information
Theory 38, 1842–1845 (1992). [60] P. Bak, How Nature Works: The Science of Self-
Organized Criticality. Copernicus, New York (1996).
[39] C. E. Shannon, A mathematical theory of communication
I. Bell System Technical Journal 27, 379–423 (1948). [61] P. Bak and C. Tang, Earthquakes as a self-organized crit-
ical phenomenon. Journal of Geophysical Research 94,
[40] C. E. Shannon, A mathematical theory of communication 15635–15637 (1989).
II. Bell System Technical Journal 27, 623–656 (1948).
[62] Z. Olami, H. J. S. Feder, and K. Christensen, Self-
[41] T. M. Cover and J. A. Thomas, Elements of Information organized criticality in a continuous, nonconservative cel-
Theory. John Wiley, New York (1991). lular automaton modeling earthquakes. Phys. Rev. Lett.
[42] B. B. Mandelbrot, An information theory of the statsit- 68, 1244–1247 (1992).
ical structure of languages. In W. Jackson (ed.), Symp. [63] P. Bak and K. Sneppen, Punctuated equilibrium and crit-
Applied Communications Theory, pp. 486–502, Butter- icality in a simple model of evolution. Phys. Rev. Lett. 74,
worth, Woburn, MA (1953). 4083–4086 (1993).
[43] W. J. Reed and B. D. Hughes, From gene families and [64] J. M. Carlson and J. Doyle, Highly optimized tolerance:
genera to incomes and internet file sizes: Why power A mechanism for power laws in designed systems. Phys.
laws are so common in nature. Phys. Rev. E 66, 067103 Rev. E 60, 1412–1427 (1999).
(2002).
[65] J. M. Carlson and J. Doyle, Highly optimized tolerance:
[44] J.-P. Bouchaud, More Lévy distributions in physics. In Robustness and design in complex systems. Phys. Rev.
M. F. Shlesinger, G. M. Zaslavsky, and U. Frisch (eds.), Lett. 84, 2529–2532 (2000).
Lévy Flights and Related Topics in Physics, number 450
[66] K. Sneppen and M. E. J. Newman, Coherent noise, scale
in Lecture Notes in Physics, Springer, Berlin (1995).
invariance and intermittency in large systems. Physica D
[45] N. Jan, L. Moseley, T. Ray, and D. Stauffer, Is the fossil 110, 209–222 (1997).
record indicative of a critical system? Adv. Complex Syst.
2, 137–141 (1999). [67] D. Sornette and R. Cont, Convergent multiplicative pro-
cesses repelled from zero: Power laws and truncated
28 Power laws, Pareto distributions and Zipf’s law
power laws. Journal de Physique I 7, 431–444 (1997). [70] P. Sibani and P. B. Littlewood, Slow dynamics from noise
[68] D. Sornette, Multiplicative processes and power laws. adaptation. Phys. Rev. Lett. 71, 1482–1485 (1993).
Phys. Rev. E 57, 4811–4813 (1998). [71] P. Sibani, M. R. Schmidt, and P. Alstrøm, Fitness op-
[69] X. Gabaix, Zipf’s law for cities: An explanation. Quar- timization and decay of the extinction rate through bio-
terly Journal of Economics 114, 739–767 (1999). logical evolution. Phys. Rev. Lett. 75, 2055–2058 (1995).