A R N M E A: Dvanced Obust AND Onparametric Ethods IN Fficiency Nalysis
A R N M E A: Dvanced Obust AND Onparametric Ethods IN Fficiency Nalysis
Series Editors:
Rolf Färe
Shawna Grosskopf
Oregon State University
R. Robert Russell
University of California, Riverside
by
Cinzia Daraio
Institute of Informatics and Telematics (CNR), Pisa
and
Department of Electrical Systems and Automation,
School of Engineering, University of Pisa
Léopold Simar
Institute of Statistics
Université Catholique de Lovain, Louvain la Neuve, Belgium
123
Library of Congress Control Number: 2006927067
ISBN-10: 0-387-35155-8 e-ISBN 0-387-35231-7
ISBN-13: 978-0387-35155-1 e-ISBN-13: 978-0387-35231-1
987654321
springer.com
Ai miei genitori
Luisa e Innocenzo Daraio
À ma famille
Contents
v
Dedication
xi
List of Figures
xv
List of Tables
Preface xvii
Acknowledgments xxi
1. INTRODUCTION 1
1.1 What this work is about 1
1.2 Improving the nonparametric approach in frontier analysis 4
1.3 An outline of the work 8
Part I Methodology
2. THE MEASUREMENT OF EFFICIENCY 13
2.1 Productivity and Efficiency 13
2.2 A short history of thought 16
2.3 The economic model 19
2.4 A taxonomy of efficient frontier models 25
2.5 The nonparametric frontier approach 30
2.5.1 Data Envelopment Analysis (DEA) 31
2.5.2 Free Disposal Hull (FDH) 33
2.6 Recent developments in nonparametric efficiency analysis 39
3. STATISTICAL INFERENCE IN NONPARAMETRIC
FRONTIER ESTIMATION 43
3.1 Statistical foundation 43
3.2 Introducing stochastic noise in the model 45
viii Contents
Part II Applications
6. ECONOMIES OF SCALE, SCOPE AND EXPERIENCE
IN THE ITALIAN MOTOR-VEHICLE SECTOR 135
6.1 Introduction 135
6.2 Data description 139
6.2.1 Definition of outputs and inputs 142
6.2.2 An exploratory investigation 145
6.2.3 Aggregation of inputs and outputs 148
6.3 Testing returns to scale and bootstrapping efficiency scores 151
6.4 Economies of scale 157
6.5 Economies of scope 160
6.6 Economies of experience 163
6.7 Conclusions 164
7. AGE, SCALE AND CONCENTRATION EFFECTS
IN A PUBLIC RESEARCH SYSTEM 167
7.1 Introduction 167
7.2 Data description 176
7.3 Scale and concentration effects 178
7.4 Age effects on CNR scientific productivity 181
7.5 Robust parametric approximation of multioutput
distance function 186
7.6 Conclusions 191
8. EXPLORING THE EFFECTS OF MANAGER TENURE, FUND
AGE AND THEIR INTERACTION 193
8.1 Introduction 193
8.2 Data description 196
8.3 Impact of mutual fund manager tenure on performance 199
8.4 Interaction between manager tenure and fund age 208
x Contents
The topic of production efficiency has attracted attention since Adam Smith’s
pin factory and even before. However, a rigorous analytical approach to the
measurement of efficiency in production originated with the work of Koopmans
(1951) and Debreu (1951), empirically applied by Farrell (1957). Farrell’s
seminal work gave rise to a considerable amount of studies.
The basic idea of efficiency analysis is to make a comparison among a group
of firms or branches or among Decision Making Units (DMUs), in order to
evaluate how the resources (or inputs) are used to obtain (produce) the products
(services or outputs). This evaluation process is based on the estimation of a
benchmark frontier against which the DMUs are assessed, using DMUs’ inputs
and outputs. The level of efficiency of each DMU is gauged as the distance
from the estimated (‘efficient’) frontier.
In literature on efficiency analysis, the nonparametric approach has received
a considerable amount of interest, both from a theoretical and an applied per-
spective. This mainly because it does not require many assumptions and par-
ticularly because it does not need the specification of a functional form for the
frontier. Hence, the parameters of the functional form of the frontier do not
have to be estimated in this approach, from which the name ‘nonparametric’
approach derives, whereas in the parametric approach, the parameters of the
efficient frontier must be estimated. Data Envelopment Analysis (DEA) and
Free Disposal Hull (FDH) are among the most known and applied nonparamet-
ric techniques for the measurement of the efficiency in production and service
activities (see e.g., Cooper, Seiford and Tone, 2000, for about 1,500 references
of their applications). Nevertheless, this traditional nonparametric approach
(DEA/FDH based) presents some severe limitations that are not always taken
into account by researchers who apply it in empirical works. These limits
should be carefully considered in order to provide a correct interpretation of the
obtained results.
xviii Preface
This book has been specifically designed for applied economists who have
an interest in the advantages of traditional nonparametric methods (DEA/FDH)
for efficiency analysis, but are sceptical about adopting them because of the
drawbacks they present.
In Part II of the book (Applications), we propose three empirical illustrations
taken from different economic fields: insurance sector, scientific research and
mutual funds industry. These applications perfectly illustrate how the tools
we propose can be used to analyse economies of scale, economies of scope,
dynamics of age and agglomeration effects, trade-offs in production and service
activities, groups comparison as well as help explain efficiency differentials.
These extensively treated empirical applications, based on real data, show the
usefulness of our approach in applied economics. Through these applications
we illustrate how various statistical tools can be combined to shed light on the
key features of the studied production process.
Moreover, this book has also been written for researchers with a background
in Operations Research (OR) and/or Management Science (MS), who would
like to deepen their knowledge of these new robust and nonparametric tech-
niques, which have been recently presented at specialised conferences and have
appeared on the scientific journals in recent years. In this book they will find
a readable, synthetic but also accurate presentation of these recent advances -
without the burden of technicalities and formal demonstrations - together with
an extensive illustration of their use in empirical works.
Acknowledgments
This book is the result of a scientific collaboration between the two au-
thors started in 2000 during the Ph.D. of Cinzia Daraio at the Scuola Superiore
Sant’Anna of Pisa, Italy, and reinforced by her Master of Arts in Statistics at-
tended at the Université Catholique de Louvain, Belgium, in 2001-2002. Within
this research collaboration a series of papers, published or forthcoming, as well
as several joint research projects have been realized or are in progress. This
book can be viewed as a milestone of this still continuing and active synergy.
The research activity from which this book originated has been funded by
various national and international research funds. In particular the financial
support of the following projects is gratefully acknowledged:
The Interuniversity Attraction Pole, Phase V (No. P5/24) from the Belgian
Government (Belgian Science Policy);
The Italian national project (iRis) Reorganizing the Italian Public Research
system to foster technology transfer: governance, tools and implementa-
tion;
We would like to thank all our colleagues and friends who encouraged us
in the writing of this book. In particular we are grateful to Carlo Bianchi, An-
drea Bonaccorsi, Hal Fried, Giampiero M. Gallo, Shawna Grosskopf, Jacques
Mairesse, Henk Moed, Paula Stephan, Robert Russell and Paul Wilson.
Finally, special thanks go to Sean Lorre, Springer editor, and to Deborah
Doherty, Springer author support, for their kind cooperation.
Chapter 1
INTRODUCTION
of the production set (convexity implies that if two observations are possible,
then all the linear combinations that lie between them are also possible).
The preference of the nonparametric approach over the parametric approach
(based on the functional specification of the frontier), is due to the small amount
of assumptions required and mainly to the fact that we do not have to specify
the functional form of the relation inputs-outputs and we do not need to specify
a distributional form for the inefficiency term.
Nonetheless, traditional nonparametric estimators based on envelopment
techniques (i.e. DEA /FDH types) were for a long time limited by several draw-
backs: deterministic (meaning that all deviations from the efficient frontier are
considered as inefficiency, and no noise is allowed) and non statistical nature;
influence of outliers and extreme values; lack of parameters for the economic
interpretation; unsatisfactory techniques for the introduction of environmental
or external variables in the measurement of the efficiency.
Our work treats at length recently introduced robust and nonparametric ap-
proaches in efficiency analysis which overcome most traditional limits of the
nonparametric approach listed above. In doing so, we provide computationally
feasible methods of calculation (both of the efficient frontier and of the distance
from it) and explanation of efficiency differentials.
We believe that the robust and nonparametric approach in frontier analysis
has reached a level of generality and has overcome most of its limits, so that it
can be considered as being more flexible and more suitable for the evaluation
of complex production and service activities, with respect to other approaches,
like the parametric approach.
The economic model underlying our robust and nonparametric frontier ap-
proach is very general: it does not make any assumptions about the behaviour
of the firms (or DMUs) and does not introduce prices of factors which are
considered as the link of DEA-based models with the neoclassical theory of
production (Ray, 2004). Moreover, in a lot of empirical applications prices are
not available (as is the case for scientific research, several no-profit services,
and so on).
Our book is designed to fill a gap in the literature by systematically propos-
ing the recent developments of the nonparametric approach and illustrating its
usefulness for empirical research through three full economic applications. We
propose an intuitive and readable, but in the meantime rigorous presentation
of advanced nonparametric and robust methods in efficiency analysis, without
the burden of technicalities and demonstrations. This methodology does not
impose any assumption on the behavior of firms and therefore, it is a general
and flexible tool suitable for applications both in theories of production that
generalize the neoclassical theory, and in alternative approaches.
The material contained in this work offers a background for researchers of
different disciplines.
4 Introduction
Applied economists may be interested in the whole book, both the Method-
ology and the Applications parts.
Researchers in MS may well start their reading with the Applications, and
then go back to the Methodology, for a better understanding of the applied
techniques.
This book could also be adopted for specialised courses in efficiency analysis,
for graduate students or undergraduate students of the last years.
Limitations of Proposed
Nonparametric methods Advancements
Deterministic nature and no easy Noise - Hall and Simar, (2002); Simar (2003b).
inference
Statistical properties - Simar and Wilson, (2000a);
Kneip, Simar and Wilson, (2003).
Probabilistic approach
Daraio and Simar (2005a, b) and this book.
In Part II, the three applications have a similar structure. The introductory
section presents the relevant literature on the topic treated and states the main
research questions addressed in the chapter. Then, the description of the used
dataset is presented in a second section, where some descriptive statistics are
reported. In particular, in Section 6.2 also an exploratory Principal Component
Analysis is reported as well as a procedure to aggregate inputs and outputs. After
that, the various methods described in Part I are applied in the different fields and
the empirical results are commented. Finally, the concluding section reported
at the end of each chapter summarizes and formulate policy implications on the
main results obtained.
Specifically, Chapter 6 deals with the Italian insurance industry. It focuses
on the motor vehicle sector. It provides tests on returns to scale, bootstrapped
confidence intervals for the efficiency estimates, and empirical evidence on
economies of scale, scope and experience.
Chapter 7 analyses a public research system: the research institutes of the
Italian National Research Council (CNR). Economies of scale, agglomeration
and age effects on scientific productivity are investigated, evaluating in partic-
ular the interaction between scale and agglomeration effects. A robust scale
elasticity is also estimated using the newly introduced multi-output parametric
approximation of robust nonparametric frontiers.
Finally, Chapter 8 focuses on US Aggressive Growth mutual funds. It exam-
ines how manager tenure and funds age affect the performance of mutual funds.
The interaction of manager tenure and age of funds is also assessed providing
detailed results on groups of best and worst performers.
The last chapter sums up the main points and concludes the book.
Chapter 2
Similar, but not equal, is the concept of efficiency. Even though, in the effi-
ciency literature many authors do not make any difference between productivity
and efficiency. For instance, Sengupta (1995) and Cooper, Seiford and Tone
(2000) define both productivity and efficiency as the ratio between output and
input.
Instead of defining the efficiency as the ratio between outputs and inputs, we
can describe it as a distance between the quantity of input and output, and the
quantity of input and output that defines a frontier, the best possible frontier for
a firm in its cluster (industry).
Efficiency and productivity, anyway, are two cooperating concepts. The
measures of efficiency are more accurate than those of productivity in the sense
that they involve a comparison with the most efficient frontier, and for that they
can complete those of productivity, based on the ratio of outputs on inputs.
Lovell (1993) defines the efficiency of a production unit in terms of a com-
parison between observed and optimal values of its output and input. The
comparison can take the form of the ratio of observed to maximum potential
output obtainable from the given input, or the ratio of minimum potential to
observed input required to produce the given output. In these two comparisons
the optimum is defined in terms of production possibilities, and efficiency is
technical.
Koopmans (1951; p. 60) provided a definition of what we refer to as tech-
nical efficiency: an input-output vector is technically efficient if, and only if,
increasing any output or decreasing any input is possible only by decreasing
some other output or increasing some other input.
Farrell (1957; p. 255) and much later Charnes and Cooper (1985; p. 72) go
back over the empirical necessity of treating Koopmans’ definition of technical
efficiency as a relative notion, a notion that is relative to best observed practice
in the reference set or comparison group. This provides a way of differentiating
efficient from inefficient production units, but it offers no guidance concerning
either the degree of inefficiency of an inefficient vector or the identification of
an efficient vector or combination of efficient vectors against which comparing
an inefficient vector.
Debreu (1951) offered the first measure of productive efficiency with his coef-
ficient of resource utilization. Debreu’s measure is a radial measure of technical
efficiency. Radial measures focus on the maximum feasible equiproportionate
reduction in all variable inputs, or the maximum feasible equiproportionate
expansion of all outputs. They are independent of unit of measurement.
Applying radial measures the achievement of the maximum feasible input
contraction or output expansion suggests technical efficiency, even though there
may remain slacks in inputs or surpluses in output. In economics the notion of
efficiency is related to the concept of Pareto optimality. An input-output bundle
is not Pareto optimal if there remains the opportunity of any net increase in
Productivity and Efficiency 15
the industry production level is optimally allocated between the firms in the
short run. A broad interpretation of Farrell’s notion of structural efficiency can
be stated as follows: industry or cluster A is more efficient structurally than in-
dustry B, if the distribution of its best firms is more concentrated near its efficient
frontier for industry A than for B. In their empirical study, Bjurek, Hjalmarsson
and Forsund (1990) compute structural efficiency by simply constructing an
average unit for the whole cluster and then estimating the individual measure of
technical efficiency for this average unit. On more general aggregation issues,
see Färe and Zelenyuk (2003) and Färe and Grosskopf (2004, p. 94 ff).
1 This section is based on Färe, Grosskopf and Lovell (1994), pp. 1-23; and Kumbhakar and Lovell (2000),
pp. 5-7.
A short history of thought 17
Linear programming techniques are also used in production analysis for non-
parametric ‘tests’2 on regularity conditions and behavioral objectives. Afriat
(1972) developed a series of consistency ‘tests’ on production data by assuming
an increasing number of more restrictive regularity hypotheses on production
technology. In so doing he expanded his previous work on utility functions
(Afriat 1967) based on the revealed preference analysis (Samuelson, 1948).
These ‘tests’ of consistency, as well as similar ‘tests’ of hypotheses proposed
by Hanoch and Rothschild (1972), are all based on linear programming formu-
lations. Diewert and Parkan (1983) suggested that this battery of tools could be
used as a screening device to construct frontiers and measure efficiency of data
relative to the constructed frontiers. Varian (1984, 1985, 1990) and Banker and
Maindiratta (1988) extended the Diewert and Parkan approach. In particular,
Varian seeks to reduce the “all-or-nothing” nature of the tests - either data pass
a test or they do not - by developing a framework for allowing small failures to
be attributed to measurement in the data rather than to failure of the hypothesis
under investigation.
All these studies use nonparametric linear programming models to explore
the consistency of a dataset, or a subset of a dataset, with a structural (e.g.
constant return to scale) or parametric (e.g. Cobb-Douglas) or behavioral (e.g.
cost minimization) hypothesis. These tools, originally proposed as screening
devices to check for data accuracy, provide also guidance in the selection of
parametric functional forms as well as procedures useful to construct frontiers
and measure efficiency. The problem of nonparametric exploration of regularity
conditions and behavioral objectives has been treated also by Chavas and Cox
(1988, 1990), Ray (1991), and Ray and Bhadra (1993).
Some works have indirectly influenced the development of the efficiency
and productivity analysis. Hicks (1935, p.8) states his “easy life” hypothesis
as follows: “people in monopolistic positions [...] are likely to exploit their
advantage much more by not bothering to get very near the position of maximum
profit, than by straining themselves to get very close to it. The best of all
monopoly profits is a quite life”. The suggestion of Hicks, i.e. the fact that
the absence of competitive pressure might allow producers the freedom to not
fully optimize conventional objectives, and, by implication, that the presence
of competitive pressure might force producers to do so, has been adopted by
many authors (see e.g. Alchian and Kessel, 1962, and Williamson, 1964).
Another field of work, related to efficiency literature, is the property rights
field of research, which asserts that public production is inherently less effi-
cient than private production. This argument, due originally to Alchian (1965),
states that concentration and transferability of private ownership shares create
2 Here and below when we use the word test between quotation mark we mean qualitative indicators that are
3 This expectation is based on a rich theoretical literature. See e.g. the “classical” survey by Holmstrom and
Tirole (1989).
4 See also Färe and Grosskopf (2004), pp.151-161.
20 The measurement of efficiency
in utilizing the minimum inputs required to produce the outputs they choose
to produce, given the technology at their disposal. In light of the evident fail-
ure of at least some producers to optimize, it is desirable to recast the analysis
of production away from the traditional production function approach toward
a frontier based approach. Hence we are concerned with the estimation of
frontiers, which envelop data, rather than with functions, which intersect data.
In this setting, the main purpose of productivity analysis studies is to evaluate
numerically the performance of a certain number of firms (or business units or
Decision Making Units, DMU) from the point of view of technical efficiency,
i.e. their ability to operate close to, or on the boundary of their production set.
The problem to be analyzed is thus set in terms of physical input and output
quantities.
We assume to have data in cross-sectional form, and for each firm we have
the value of its inputs and outputs used in the production process. Measuring
efficiency for any data set of this kind requires first to determine what the
boundary of the production set can be; and then to measure the distance between
any observed point and the boundary of the production set.
Given a list of p inputs and q outputs, in economic analysis the operations of
any productive organization can be defined by means of a set of points, Ψ, the
production set, defined as follows in the Euclidean space Rp+q + :
where x is the input vector, y is the output vector and “feasibility” of the vec-
tor (x, y) means that, within the organization under consideration, it is physi-
cally possible to obtain the output quantities y1 , ..., yq when the input quantities
x1 , ..., xp are being used (all quantities being measured per unit of time). It is
useful to define the set Ψ in terms of its sections, defined as the images of a
relation between the input and the output vectors that are the elements of Ψ.
We can define then the input requirement set (for all y ∈ Ψ) as:
An input requirement set C(y) consists of all input vectors that can produce the
output vector y ∈ Rq+ .
The output correspondence set (for all x ∈ Ψ) can be defined as:
P (x) consists of all output vectors that can be produced by a given input vector
x ∈ Rp+ .
The production set Ψ can also be retrieved from the inputs sets, specifically:
5 Here and throughout inequalities involving vectors are defined componentwise, i.e. on an element-by-
element basis.
22 The measurement of efficiency
or equivalently6 ,
∀α > 0, C(αy) = αC(y).
From what stated above, DMUs are efficient, e.g. in an input-oriented frame-
work, if they are on the boundary of the input requirement set (or, for the output
oriented case, on the boundary of the output correspondence set). In some cases,
however, these efficient firms may not be using the fewest possible inputs to
produce their outputs. This is the case where we have slacks. This is due to the
fact that the Pareto-Koopmans efficient subsets of the boundaries of C(y) and
P (x), i.e. eff C(y) and eff P (x), may not coincide with the Farrell-Debreu
boundaries ∂C(y) and ∂P (x), i.e.7 :
eff C(y) = x | x ∈ C(y), x ∈ C(y) ∀x ≤ x, x = x ⊆ ∂C(y), (2.8)
eff P (x) = y | y ∈ P (x), y ∈ P (x) ∀y ≥ y, y = y ⊆ ∂P (x). (2.9)
7 We give an illustration in Section 2.5 in Figure 2.2 where we describe DEA estimators of efficient frontier.
24 The measurement of efficiency
Once the efficient subsets of Ψ have been defined, we may define the efficiency
measure of a firm operating at the level (x0 , y0 ) by considering the distance
from this point to the frontier. There are several ways to achieve this but a
simple way suggested by Farrell (1957), in the lines of Debreu (1951), is to use
a radial distance from the point to its corresponding frontier. In the following
we will concentrate our attention on radial measures of efficiency. Of course, we
may look at the efficient frontier in two directions: either in the input direction
(where the efficient subset is characterized by ∂C(y)) or in the output direction
(where the efficient subset is characterized by ∂P (x)).
The Farrell input measure of efficiency for a firm operating at level (x0 , y0 )
is defined as:
θ(x0 , y0 ) = inf{θ|θx0 ∈ C(y0 )} = inf{θ|(θx0 , y0 ) ∈ Ψ}, (2.10)
and its Farrell output measure of efficiency is defined as:
λ(x0 , y0 ) = sup{λ|λy0 ∈ P (x0 )} = sup{λ|(x0 , λy0 ) ∈ Ψ}. (2.11)
So, θ(x0 , y0 ) ≤ 1 is the radial contraction of inputs the firm should achieve
to be considered as being input-efficient in the sense that (θ(x0 , y0 )x0 , y0 ) is a
frontier point. In the same way λ(x0 , y0 ) ≥ 1 is the proportionate increase of
output the firm should achieve to be considered as being output efficient in the
sense that (x0 , λ(x0 , y0 )y0 ) is on the frontier.
It is interesting to note that the efficient frontier of Ψ, in the radial sense, can
be characterized as the units (x, y) such that θ(x, y) = 1, in the input direction
(belonging to ∂C(y)) and by the (x, y) such that λ(x, y) = 1, in the output
direction (belonging to ∂P (x)). If the frontier is continuous, frontier points are
such that θ(x, y) = λ(x, y) = 1. The efficient frontier is unique but we have
two ways to characterize it.
It is sometimes easier to measure these radial distances by their inverse,
known as Shephard distance functions (Shephard, 1970). The Shephard input
distance function provides a normalized measure of Euclidean distance from a
point (x, y) ∈ Rp+q+ to the boundary of Ψ in a radial direction orthogonal to y
and is defined as:
δ in (x, y) = sup{θ > 0|(θ−1 x, y) ∈ Ψ} ≡ (θ(x, y))−1 , (2.12)
with δ in (x, y) ≥ 1, ∀(x, y) ∈ Ψ. Similarly, the Shephard output distance
function provides a normalized measure of Euclidean distance from a point
(x, y) ∈ Rp+q
+ to the boundary of Ψ in a radial direction orthogonal to x:
The analysis of the existent literature is a necessary step for the advancement
of a discipline. This is particularly true for the field of efficiency and produc-
tivity research that in the last decades has known an exponential increasing
in the number of methodological and applied works. For a DEA bibliogra-
phy over 1978-1992, see Seiford (1994, 1996) and for an extension till 2001
see Gattoufi, Oral and Reisman (2004). In Cooper, Seiford and Tone (2000)
about 1,500 DEA references are reported. Other bibliographic studies include:
Emrouznejad (2001) and Taveres (2002).
The econometric problem is thus how to estimate Ψ, and then ∂C(y), ∂P (x),
θ(x, y), λ(x, y), from a random sample of production units X = {(Xi , Yi ) | i =
1, ..., n}.
26 The measurement of efficiency
Starting from the first empirical application of Farrell (1957) several different
approaches for efficient frontier estimation and efficiency score calculation have
been developed.8
In Figure 2.1 we propose an outline of what we believe have been the most
influential works in productivity and efficiency analysis, starting from the pi-
oneering work by Farrell (1957). Of course, our outline is far from being
complete and all-inclusive. Figure 2.1 shows some of the articles, books and
special issues of journals (i.e. Journal of Econometrics JE, Journal of Produc-
tivity Analysis JPA, European Journal of Operational Research, EJOR) that
have mainly influenced the writing of this work, trying to balance them accord-
ing to the adopted approach.
As it is evident from Figure 2.1 we have taken into consideration mainly the
nonparametric approach as we believe that thanks to its last developments, it
can be considered as being very flexible and very useful for modeling purpose.
We may classify efficient frontier models according to the following criteria:9
1 The specification of the (functional) form for the frontier function;
2 The presence of noise in the sample data;
3 The type of data analyzed.
Based on the first criterium (functional form of the frontier) is the classifi-
cation in:
Parametric Models. In these models, the attainable set Ψ is defined trough
a production frontier function, g(x, β), which is a known mathematical
function depending on some k unknown parameters, i.e. β ∈ Rk , where
generally y is univariate, i.e. y ∈ R+ . The main advantages of this
approach are the economic interpretation of parameters and the statistical
properties of estimators; more critical are the choice of the function g(x, β)
and the handling of multiple inputs, multiple outputs cases (for more on
this latter aspect see Section 4.7 below where we introduce multivariate
parametric approximations of nonparametric and robust frontiers).
Nonparametric Models. These models do not assume any particular func-
tional form for the frontier function g(x). The main pros of this approach
are the robustness to model choice and the easy handling of multiple in-
puts, multiple outputs case; their main limitations are the estimation of
unknown functional and the curse of dimensionality10 , typical of nonpara-
metric methods.
8 Foran introduction see e.g., Coelli, Rao and Battese (1998) and Thanassoulis (2001).
9 These criteria follow Simar and Wilson (2006b), where a comprehensive statistical approach is described.
10 The curse of dimensionality, shared by many nonparametric methods, means that to avoid large variances
Charnes Banker Special Simar Banker Fare Korostelev Banker Kneip Cooper Park Simar Cazals Simar Fare
Cooper Charnes issue Grosskopf Simar Park Seiford Simar Wilson Florens Grosskopf
Florens 2005
Rhodes Cooper JE Tulkens Lovell Tsybakov Grosskopf Simar Tone Weiner Simar
Simar
Ray
Special Simar Simar Gijbels Simar Hall
Deprins Sickles
issues Wilson MammenWilson Simar
Simar JPA / Kneip Park
Tulkens Daraio
EJOR Simar Simar Simar
Simar
Wilson 2006
Simar Fried
Wilson Lovell
Debreu 1951 Schmidt
A taxonomy of efficient frontier models
(eds.)
Koopmans 1951
Kumbhakar
1994 1998 2003 a,b Semiparametric Park
Farrell 1957 Simar
Park and Simar Park, Sickles and Simar Park, Sickles and Simar approach
Tsionas
Shephard 1953
Simar
Shephard 1970 Wilson
Meeusen
van den Broeck
van den Broeck
Aigner Koop
Aigner COLS Lovell Osiewalski Ritter Kumbhakar
Chu Afriat Richmond Schmidt Stevenson Greene Steel Simar Lovell
P rob{(Xi , Yi ) ∈ Ψ} = 1
for all i = 1, ..., n. The main weakness of this approach is the sensitivity
to “super-efficient” outliers. Robust estimators are able to overcome this
drawback.
Stochastic Models, in which there might be noise in the data, i.e. some
observations might lie outside Ψ. The main problem of this approach is
the identification of noise from inefficiency.
Based on the third criterium (type of data analyzed) is the classification in:
Cross-sectional Models, in which the data sample is done by observations
on n firms or DMUs (Decision Making Units):
Panel Data Models, in which the observations on the n firms are available
over T periods of time:
(1982). These issues include the definition of the Malmquist productivity in-
dex; although all are based on the distance functions that Malmquist employed
to formulate his original quantity index, variations include the geometric mean
form used by Färe, Grosskopf, Lindgren and Roos (1989) and the quantity in-
dex form by Diewert (1992). The survey of the empirical literature presents
studies on the public sector, banking, agriculture, countries and international
comparisons, electric utilities, transportation, and insurance. See also Lovell
(2003), and Grosskopf (2003) for an historical perspective and an outline of the
state of the art in this area.
Although productivity change is not the main focus of FDH, it can be inferred
from information on efficiency change and technical change that is revealed by
FDH. The technique was developed by Tulkens that named it “sequential FDH”.
For an illustration of the sequential FDH see Lovell (1993, pp. 48-49). On this
topic see also Tulkens and Vanden Eeckaut (1995a, 1995b).
By combining the three criteria mentioned above, several models have been
studied in literature:
Parametric Deterministic Models, see e.g. Aigner and Chu (1968), Afriat
(1972), Richmond (1974), Schmidt (1976) and Greene (1980) for cross-
sectional and panel data;
Parametric Stochastic Models, most of these techniques are based on the
maximum likelihood principle, following the pioneering works of Aigner,
Lovell and Schmidt (1977) and Meeusen and van den Broeck (1977). For
a recent review see Kumbhakar and Lovell (2000). In the context of
panel data, stochastic models (see Schmidt and Sickles, 1984, and Corn-
well, Schmidt, and Sickles, 1990) have semiparametric generalizations,
in which a part of the model is parametric and the rest is nonparamet-
ric (see Park and Simar, 1994; Park, Sickles and Simar, 1998; and Park,
Sickles and Simar, 2003a, b).
Nonparametric Deterministic Models for cross-sectional and panel data.
Traditional references on these models include: Färe, Grosskopf and
Lovell (1985, 1994), Fried, Lovell and Schmidt (1993), and Charnes,
Cooper, Lewin and Seiford, 1994. Recent and updated references are
Cooper, Seiford and Tone (2000), Ray (2004) and Färe and Grosskopf
(2004).
Nonparametric Stochastic Models for cross-sectional data (see Hall and
Simar, 2002; Simar, 2003b; Kumbhakar, Park, Simar and Tsionas, 2004)
and panel data (see Kneip and Simar, 1996; and Henderson and Simar,
2005).
The mainly used approaches in empirical works are the nonparametric (de-
terministic) frontier approach and the (parametric) stochastic frontier approach.
30 The measurement of efficiency
DEA is thus the smallest free disposal convex set covering all the data.
Ψ
The Ψ DEA in (2.14) allows for Variable Returns to Scale (VRS) and is often
DEA−V RS (see Banker, Charnes and Cooper, 1984). It may be
referred as Ψ
adapted to other returns to scale situations. It allows for:
n
Constant Returns to Scale (CRS) if the equality constrained i=1 γi =1
in (2.14) is dropped;
n
Non Increasing Returns to Scale
(NIRS) if the equality constrained i=1 γi
= 1 in (2.14) is changed in ni=1 γi ≤ 1;
Non
n
Decreasing Returns to Scale (NDRS)
n
if the equality constrained
i=1 γi = 1 in (2.14) is modified in i=1 γi ≥ 1.
The estimation of the input requirement set is given for all y by: C(y) =
p
{x ∈ R+ |(x, y) ∈ ΨDEA } and ∂ C(y) denotes the estimator of the input
frontier boundary for y.
For a firm operating at level (x0 , y0 ) the estimation of the input efficiency
score θ(x0 , y0 ) is obtained by solving the following linear program (here and
hereafter we consider the VRS case):
θDEA (x0 , y0 ) = inf θ | (θx0 , y0 ) ∈ Ψ
DEA (2.15)
n
n
θDEA (x0 , y0 ) = min θ | y0 ≤ γi Yi ; θx0 ≥ γi Xi ; θ > 0;
i=1 i=1
n
γi = 1; γi ≥ 0; i = 1, ...., n . (2.16)
i=1
n
n
DEA (x0 , y0 ) = max λ | λy0 ≤
λ γi Yi ; x0 ≥ γi Xi ; λ > 0;
i=1 i=1
n
γi = 1; γi ≥ 0; i = 1, ...., n . (2.18)
i=1
In Figure 2.2 we display the DEA estimator and illustrate the concept of slacks
through an example. If we look at the left panel assuming that all firms produce
the same level of output, we can see that the DMU E could actually produce 1
unit of y with less input x1 , i.e., it could reduce x1 by one unit (from 4 to 3)
moving from E to D. This is referred to as input slack: although the DMU is
technical efficient, there is a surplus of input x1 .11 In general, we say that there
is slack in input j of DMU i, i.e., xji , if:
n
i , yi )
γi xi < xji θ(x (2.19)
i=1
is true for some solution value of γi , i = 1, ..., n (see Färe, Grosskopf and
Lovell, 1994, for more details).
The same kind of reasoning can be done for the output oriented case, i.e. the
DMU L could increase the production of y1 moving from L to M. See Figure
2.2, right panel for a graphical illustration.
Slacks may happen for DEA estimates (as shown in Figure 2.2), as well as
for FDH estimates (presented in the next section). It is interesting to note that
if the true production set Ψ has no slacks, than slacks are only a small sample
problem. Nevertheless, it is always useful to report slacks whenever they are
11 Remember the “possibility of destroying goods without costs” underlying the frontier representation of
the economic model.
The nonparametric frontier approach 33
P(x)
C(y)
D E
X1 Y1
O 3 4 O
there. It is left to the analyst to decide if it is better to correct for the slacks or
just point them.
Once the efficiency measures have been computed, several interesting analy-
sis could be done, such as the inspection of the distribution of efficiency scores
and the analysis of the “best performers” or efficient facet of the frontier closer
to the analysed DMU, generally called peer-analysis, to study the technical
efficient units and try to learn from them.
It is the union of the all positive orthants in the inputs and of the negative orthants
in the outputs whose origin coincides with the observed points (Xi , Yi ) ∈
X (Deprins, Simar and Tulkens, 1984). See Figures 2.3 and 2.4 where the
FDH estimator is compared with the DEA estimator of the input and output
requirement sets, respectively.
The efficiency estimators, in this framework, are obtained (as for the DEA
case) using a “plug-in principle”, i.e., by substituting the unknown quantities (in
this case Ψ) by their estimated values (here Ψ F DH , for the DEA case Ψ DEA ).
The estimated input requirement set and the output correspondence set are
the following:
C(y) F DH },
= {x ∈ Rp+ |(x, y) ∈ Ψ
Hence, the estimated input efficiency score for a given point (x0 , y0 ) ∈ Ψ is:
θF DH (x0 , y0 ) = inf θ | θx0 ∈ C(y
0)
F DH ,
= inf θ | (θx0 , y0 ) ∈ Ψ (2.21)
It is clear that for a particular point (x0 , y0 ), the estimated distance to the
frontiers are evaluated by means of the distance, in the input space (“input
oriented”) from this point to the estimated frontier of the input requirement
set (∂ C(y)), and in the output space (“output oriented”) by the distance from
(x0 , y0 ) to the estimated frontier of the output correspondence set (∂ P (x)).
The nonparametric frontier approach 35
It is worthwhile to note that the FDH attainable set in (2.20) can also be
characterized as the following set:
n
n
n
F DH = (x, y) ∈ Rp+q | y ≤
Ψ γi Yi ; x ≥ γi Xi , γi = 1;
+
i=1 i=1 i=1
γi ∈ {0, 1}, i = 1, ..., n . (2.23)
n
n
n
θF DH (x0 , y0 ) = min θ | y0 ≤ γi Yi ; θx0 ≥ γi Xi , γi = 1;
i=1 i=1 i=1
γi ∈ {0, 1}, i = 1, ..., n , (2.24)
n
n
n
F DH (x0 , y0 ) = max λ | λy0 ≤
λ γi Yi ; x0 ≥ γi Xi , γi = 1;
i=1 i=1 i=1
γi ∈ {0, 1}, i = 1, ..., n .(2.25)
The latter expressions allow to make the comparison easier between the FDH
and the DEA estimators (compare for instance (2.23) with (2.14)).
Figure 2.3 illustrates the estimation of the input requirement set C(y) and
of its boundary ∂C(y) through FDH and DEA methods. The dashed line rep-
resents the FDH estimation of ∂C(y), while the solid line shows the DEA
estimation of it. The squares are the observations. The DEA and FDH esti-
mates of efficiency score of production unit B, in Figure 2.3, are respectively:
θDEA (x0 , y0 ) = |OB |/|OB| ≤ 1, θF DH (x0 , y0 ) = |OB |/|OB| ≤ 1.
In Figure 2.4 we show the FDH and DEA estimation of the output corre-
spondence set P (x) and its boundary ∂P (x). The dash-dotted line represents
the FDH estimator of ∂P (x), while the solid line the DEA estimator of it.
The black squares, as before, represent the DMUs. For firm B, the estimates
of its efficiency score, in output oriented framework, are: λ F DH (x0 , y0 ) =
|OB |/|OB| ≥ 1, λDEA (x0 , y0 ) = |OB |/|OB| ≥ 1.
B=(Xo,Yo)
A .
C(y)
B’
.
B”
∂ C(y)
O X1
B”
B’
B = (Xo,Yo)
P(x) ∂ P(x)
O Y1
For a DMU (x0 , y0 ), in a first step, the set of observations which domi-
nates it is determined, and then the estimate of its efficiency score, relative to
the dominating facet of Ψ is computed. In the simplest case, with a technol-
ogy characterized by one input and one output, the set of observations which
dominate (x0 , y0 ) is defined as:
D0 = i|(Xi , Yi ) ∈ X , Xi ≤ x0 , Yi ≥ y0 . (2.26)
emerged that FDH can be economically more meaningful than convex monotone
hull, also under non-trivial alternative economic conditions.
Hence, FDH technical efficiency measures remain meaningful for theories
of the firm that do allow for imperfect competition or uncertainty (see e.g.
Kuosmanen and Post, 2001, and Cherchye, Kuosmanen and Post, 2001).
One of the main drawbacks of deterministic frontier models (DEA /FDH
based) is the influence of “super-efficient” outliers.
This is a consequence of the fact that the efficient frontier is determined by
sample observations which are extreme points. Simar (1996) points out the need
for identifying and eliminating outliers when using deterministic models. If they
cannot be identified, the use of stochastic frontier models is recommended.
See Figure 2.5 for an illustration of the influence of outliers in case of FDH
estimation. The same is valid for the DEA case. If point A is an extreme point,
outlying the cloud of other points, the estimated efficient frontier is strongly
influenced by it. In fact, in Figure 2.5, the solid line is the frontier that envelops
point A, while the dash-dotted line does not envelop point A.
FDH frontier of ψ
y
FDH frontier without A
ψFDH
Free disposability
Figure 2.5. Influence of outliers on the FDH estimation of the production set Ψ.
12 Seealso Lovell (2001) and Fried, Lovell and Schmidt (2006) for a presentation of some recent fruitful
research areas introduced in parametric and nonparametric approaches to efficiency analysis.
40 The measurement of efficiency
pose of collecting data and information that are useful for production studies.
Sengupta (1992) was the first to introduce a fuzzy mathematical programming
approach where the constraints and objective function are not satisfied crisply.
Seaver and Triantis (1992) proposed a fuzzy clustering approach for identify
unusual or extreme efficient behavior. Girod and Triantis (1999) implemented
a fuzzy linear programming approach, whilst Triantis and Girod (1998), and
Kao and Liu (1999) used fuzzy set theory, to let the traditional DEA and FDH
account for inaccuracies associated with the production plans. A fuzzy pair-
wise dominance approach can be found in Triantis and Vanden Eeckaut (2000)
where, a classification scheme that explicitly accounts for the degree of fuzzi-
ness (plausibility) of dominating units is reported.
According to a classification proposed by Angulo-Meza and Pereira Estellita
Lins (2002), the methods for increasing discrimination within efficient DMUs
in a DEA setting can be classified into two groups:
Methods with a priori information. In these methods, the information pro-
vided by a decision-maker or an expert about the importance of the variables
can be introduced into the DEA models. There are three main methods devoted
to incorporating a priori information or value judgments in DEA:
13 See Allen, Athanassopoulos, Dyson and Thanassoulis (1997), and Pedraja-Chaparro, Salinas-Jimenes,
Smith and Smith (1997) for a review of some methods within this approach, including direct weight restric-
tions, cone ratio models, assurance region and virtual inputs and outputs restrictions.
Recent developments in nonparametric efficiency analysis 41
information. The main methods that minimize the intervention of the experts
are:
statistical model; there is not, in fact, a definition of the Data Generating Process
(DGP) and there is no room for statistical inference based on the construction
of confidence intervals, estimation of the bias, statistical tests of hypothesis and
so on.
There is instead a new approach, recently developed, which aims exactly at
the analysis of the statistical properties of the nonparametric estimators, trying
to overcome most limitations of traditional nonparametric methods and allow-
ing for statistical inference and rigorous testing procedures. This literature is
the main focus of this book. To the review of the statistical properties of non-
parametric frontier estimators we devote the following Chapter 3. Chapter 4
deals in detail with a family of robust nonparametric measures of efficiency,
which are more resistent to the influence of outliers and errors in data while
having good statistical properties which let inference feasible in this complex
framework. Finally, Chapter 5 illustrates and develop further the topic of con-
ditional and robust measures of efficiency and an alternative way to evaluate
the impact of external-environmental variables based on conditional measures
of efficiency.
Chapter 3
STATISTICAL INFERENCE IN
NONPARAMETRIC FRONTIER
ESTIMATION
14 For
a selective survey on statistical inference in nonparametric frontier estimation see Grosskopf (1996).
Recent reviews are Simar and Wilson (2000a and 2006a).
44 Statistical inference in nonparametric frontier estimation
Let us start with a simple case where we only have a univariate output and
we want to carry out an output oriented efficiency analysis. By considering this
simple case, the reader will more easily understand the analogy with standard
regression models, parametric or nonparametric ones. The nonparametric fron-
tier can be defined as some unknown function ψ(x) sharing some properties
(monotonicity, possibly concavity, . . . ) that can be expressed as follows:
yi = ψ(xi ) − ui , ui ≥ 0 (3.1)
This assumption is common in most empirical studies and just states that the
observations are considered as random draws from a population of firms (this is
typically what is done in the simple model described above, even in a parametric
approach).
SA3: Smoothness. For all (x, y) in the interior of Ψ, the functions θ(x, y)
and λ(x, y) are differentiable in both arguments.
It is a sufficient condition used by Kneip, Simar and Wilson (2003) to derive
the asymptotic distribution of the DEA estimator, for the FDH estimator (where
Ψ is not assumed to be convex), only Lipschitz continuity of the functions is
required in Park, Simar and Weiner (2000).
Summing up, the DGP P is completely characterized by the knowledge of
f (x, y) and of its support Ψ with the regularity conditions (SA1–SA3) described
above. Hence, we can write P = P(Ψ, f (·, ·)).
setup. The drawback is that in practice, a large number of time periods is needed
for getting sensible results. New directions in this area have been proposed by
Henderson and Simar (2005).
In this book we focus the presentation on deterministic frontier models, so
that the popular nonparametric envelopment estimators can be considered. But
since we know these estimators are sensible to extreme value and outliers (due
to the absence of noise in the model), we pay special attention to develop
estimators which are robust to these extreme points: this will be the major topic
of Chapter 4. In the next sections we summarize the main known statistical
properties of the DEA/FDH estimators and we indicate how the bootstrap can
be implemented to solve practical inferential problems.
3.3.1 Consistency
The first minimal property one would like to achieve is consistency. Roughly
speaking consistency means that if the sample size increases, an estimator θ will
converge to the true but unknown value θ it is supposed to estimate. Mathe-
matically, we will say that θ → θ as n → ∞, meaning that as the sample size
p
increases to infinity, the probability of the error |θ − θ| being greater than any
positive value ε > 0 converges to zero. This is a minimal property that an
estimator should have to be reliable. Another important issue is then the rate of
convergence of the consistent estimator. It indicates the possibility of getting
sensible results with finite samples estimators. In classical
√ parametric statistics
(like linear regression models), estimators achieve n-consistency, meaning
that the order of the error of estimation is decreasing to zero like n−1/2 when
n → ∞. We write:
θ − θ = Op (n−1/2 ). (3.2)
where no indication was given about the rates of convergence. These rates
where obtained for the DEA (where convexity of Ψ is required) and for the
FDH case (where convexity of Ψ is not required) in Korostelev, Simar and
Tsybakov (1995). For instance in the output oriented case (one output) they
obtained:
1
F DH , Ψ) = Op (n− p+1 ),
d (Ψ
2
DEA , Ψ) = Op (n − p+2
d (Ψ ),
where d (Ψ, Ψ) is the Lebesgue measure of the difference between the two
sets and where p is the number of inputs (similar rates are obtained for the
corresponding efficiency measures). The rates of convergence reflect the curse
of dimensionality typical of many nonparametric statistical techniques; if p is
large, the estimators exhibit very low rates of convergence, and much larger
quantity of data is needed to get sensible estimates (i.e. to avoid large variances
and very wide confidence interval estimates) than in the case of small number
of inputs p. Note that for p = 1 we obtain a better rate n−2/3 than the standard
parametric rate n−1/2 .
Much later, Kneip, Park and Simar (1998) for the DEA case and Park, Simar
and Weiner (2000) for the FDH case obtained the proof of the consistency of
the estimated efficiency scores in the full multivariate setup (p, q > 1) along
with their rates of convergence. The difficulty here was to handle the radial
nature of the difference between the efficiency scores. Formally they obtain:
2
− p+q+1
θDEA (x, y) − θ(x, y) = Op (n ), (3.4)
1
− p+q
θF DH (x, y) − θ(x, y) = Op (n ). (3.5)
These results again reflect the curse of dimensionality which is even worse for
the multivariate case since the convergence rates are affected by p + q rather
than merely by p (or q), as for the former univariate case.
These results are encouraging: the methods used by researchers since decades
where indeed consistent! But these results are of little practical importance for
doing inference. To achieve this we need the sampling distributions of the
estimators in order to derive the eventual bias, or to compute its standard devia-
tion or even better to build confidence intervals for individual efficiency scores
θ(x, y).
In this complex situation, the only hope is to obtain asymptotic results, i.e.
a reasonable approximation of the sampling distribution of the estimator when
n is large enough (in the same spirit that a Central Limit Theorem gives an
approximate normal distribution of a sample mean when n is large enough).
We will see below that if today theoretical results are available, they will be of
little practical interest but will be useful to prove the consistency of the bootstrap
alternative.
Asymptotic results 49
here, again, the limiting Weibull depends on some unknown parameters depend-
ing on the DGP but which can be estimated. This result allows to obtain bias
corrected estimators and confidence intervals for the efficiency scores, however,
Park, Simar and Weiner illustrate how imprecise is the asymptotic distribution
when p + q is large with moderate sample sizes (for instance, they recommend
n to be larger than say 1000 if p + q = 5). Again, similar results are derived
for the output oriented case.
50 Statistical inference in nonparametric frontier estimation
15 TheBaron had fallen to the bottom of a deep lake. Just when it looked like all was lost, he thought to pick
himself up by his own bootstraps.
Bootstrap techniques and applications 51
Except in very few simple problems (like estimating the mean and the vari-
ance of a normal model) the sampling distribution L(θ(X )) is unknown or
only asymptotic approximations are available. The aim of the bootstrap is to
provide an approximation of this distribution which will be easy to obtain by
using Monte-Carlo approximations. Under regularity conditions the only thing
that will be required to implement the bootstrap is a consistent estimator of the
DGP P.
Indeed, if this DGP P would be known, it would be very easy to approximate
) without any mathematical developments, by
the sampling distribution of θ(X
a simple Monte-Carlo experiment that the computer could perform for us. We
could indeed simulate a large number of times a random sample X from P
and then compute the corresponding value of θ(X ) in each Monte-Carlo trial.
By repeating this exercise a larger number of time the Monte-Carlo empirical
) would provide a Monte-Carlo ap-
distribution of the observed values θ(X
proximation of the true but unknown sampling distribution L(θ(X )). This is
a direct consequence of the strong law of large number and the quality of the
approximation depends only on the number of replications in the Monte-Carlo
exercise (that the user can chose as large as she/he wants): no mathematics is
needed here, only some computing time on the computer that will perform this
simulation.
The bootstrap principle is now easy to explain: since P is unknown, we will
plug-in an appropriate consistent estimator P in the place of P in the Monte-
Carlo experiment above. Here we will call a bootstrap sample a random sample
X generated from P. If some care is taken on how to generate these bootstrap
samples, it can be proven, when the bootstrap works, that the empirical (Monte-
), which is conditional on P,
Carlo) bootstrap distribution of θ(X approximates
the unknown L(θ(X )). In fact as we will see below, to build confidence intervals
)−
it is more appropriate to rather approximate the unknown distribution of θ(X
θ by the bootstrap distribution of θ(X ) conditional on the estimate
) − θ(X
P. The error of estimation θ(X ) − θ is sometimes refereed as the estimation
) − θ(X
error in the real world whereas, θ(X ) is the error of estimation in the
bootstrap world where the true unknown P and θ have been replaced by the
known observed P and θ(X ).
In many simple applications, the easiest way to generate a random sample X
according an estimate P of P, is to mimic what has been done in the real world.
In the real world X = {X1 , . . . , Xn } is generated from P, so a nonparametric
estimator of P could be chosen as the empirical process which gives a mass
1/n at each observed sample point Xi ∈ X . So a bootstrap sample will be
defined as X = {X1 , . . . , Xn }, where each Xj is obtained by drawing with
replacement from the n values {X1 , . . . , Xn }. This is sometimes refereed as
the naive bootstrap and is very easy to implement.
52 Statistical inference in nonparametric frontier estimation
When we say that the bootstrap works we mean that the bootstrap approxima-
tion is consistent, or in other words that when the sample size n of X increases,
)− θ(X
the bootstrap distribution of θ(X ) conditional on P converge to the true
) − θ. This is the crucial point for the bootstrap. It is often
distribution of θ(X
true in statistics that the bootstrap (when correctly implemented) is consistent,
but it is well known also that there are cases where the bootstrap is inconsistent.
This may depend on the model, on the properties of the estimate P but also on
the way to generate random samples X from P. This is particularly true in the
case of estimating boundaries or support of random variables, as it is the case
in frontier models. This issue is discussed below where consistent solutions are
provided.
16 Someother bootstrap procedures have been presented in literature, but their inconsistence have been
demonstrated, see below.
Bootstrap techniques and applications 53
x
δ (x, y) = sup δ | ( , y) ∈ Ψ
.
δ
The latter can be calculated through the following linear program:
n
n
(δ (x, y))−1 = min θ > 0 | y ≤ γi Yi ; θx ≥ γi Xi ;
i=1 i=1
n
γi = 1; γi ≥ 0; i = 1, ..., n . (3.10)
i=1
Since the left hand side of (3.11) is available (though the Monte-Carlo exercise),
it can be used to provide properties usually obtained from the right-hand side.
In particular we can use the bootstrap approximation to estimate the bias of
the DEA estimator or to estimate the quantiles of the sampling distribution of
y) − δ(x, y)) in order to build confidence intervals.
(δ(x,
In Table 3.1 below, the analogy between the original inferential problem
and the bootstrap is described in terms of an analogy between the real world,
where we want to make inference about the parameter δ(x, y) but most of the
desired quantities are unknown, and the bootstrap world, where we mimic the
real world but where everything is known and so can be computed or simulated
54 Statistical inference in nonparametric frontier estimation
Table 3.1. Summary of the bootstrap principle for inference on δ(x, y).
Moments Moments
EP (
δ (x, y)) EP
(δ (x, y))
V arP (
δ (x, y)) V arP
(δ (x, y))
3.4.2
Correcting the bias of δ(x, y)
An estimator is a random variable since it is computed as a function of a
random sample. An unbiased estimator has the desirable property that its mean
is equal to the target value of the parameter being estimated. In our case, we
know by construction that the DEA estimator δ(x, y) is a biased estimator of
y) < δ(x, y)
δ(x, y) (for the input Shephard distance considered here, δ(x,
Bootstrap techniques and applications 55
1
B
bias(δ(x, y)) ≈ δb (x, y) − δ(x,
y). (3.14)
B b=1
In the same way the standard deviation of the DEA estimator δ(x, y) is
obtained as the square-root of the variance of the bootstrap distribution denoted
GP(t) in Table 3.1. Namely:
2 B 1 B 2
y)) ≈ 1
std (δ(x, δb,2 (x, y) − δb (x, y) . (3.15)
B b=1 B b=1
y) − 1 B
= 2 δ(x, δ (x, y). (3.16)
B b=1 b
However, it is well known that correcting for the bias introduces additional
noise (increasing the variance of the estimator). As a rule of thumb, Efron and
y))| >
Tibshirani (1993) recommend not to correct for the bias unless |bias(δ(x,
y))/4. In practice, due to inherent bias of the DEA estimator, the
std(δ(x,
bias-correction has almost always to be performed. Numerical examples are
provided in the second part of this book.
HP (·) were known, it would be easy to find, for instance, the values a0.025 and
a0.975 such that:
y) − δ(x, y) ≤ a0.975 ) = 0.95,
ProbP (a0.025 ≤ δ(x,
Since the quantiles aβ are unknown, the quantiles of the bootstrap distribution
of W = δ (x, y) − δ(x,
y), denoted by H (t) in Table 3.1, will provide the
P
appropriate approximation. If â0.025 and â0.975 are such that
The quantiles âβ are directly obtained from the quantiles of the Monte-Carlo
distribution of the values {δb (x, y)}B
b=1 themselves as follows, for all β ∈ [0, 1]:
y),
âβ = ĉβ − δ(x, (3.19)
1, ...., n}, a naive estimator of P would be P(Ψ, f(·, ·)) where f(·, ·) would be
the empirical distribution function of (Xi , Yi ), defined as the discrete distribu-
tion that put a probability n1 on each point (Xi , Yi ). Then a bootstrap sample
X = {(Xi , Yi ), i = 1, ...., n} would simply be obtained by randomly sam-
pling with replacement from X .
Unfortunately, it is well known from the bootstrap literature (Bickel and
Freedman, 1981, Efron and Tibshirani, 1993) that in a boundary estimation
framework, this bootstrap procedure does not provide a consistent approxima-
tion of the desired sampling distribution as in (3.11). Simar and Wilson (1999a,
b) discuss this issue in the context of multivariate frontier estimation. As illus-
trated below, the problem comes from the fact that in the naive bootstrap, the
efficient facet that determines in the original sample X the value of δ appears
too often, and with a fixed probability, in the pseudo-samples Xb and this fixed
probability does not vanish even when n → ∞.
Two solutions have been proposed to overcome this problem: either sub-
sampling, meaning that we will draw pseudo-samples of size m smaller than
n, say m = [nγ ], where γ < 1 and [a] stands for integer part of a number a
or smoothing techniques, meaning the use of a smooth estimate f(·, ·), in place
of the discrete empirical one of the naive approach. Kneip, Simar and Wilson
(2003) prove the consistency of both approaches in the case of strictly convex
attainable sets Ψ.
Subsampling techniques
Subsampling is certainly the easiest procedure to apply: we follow the pro-
cedure described above, with the only difference that the pseudo-samples Xm,b
Smoothing techniques
The idea of the smooth bootstrap (see Siverman and Young, 1987) is to draw
the pseudo observations (Xi , Yi ) from a smooth estimate of the density f (x, y).
We know how to produce such smooth estimates (see e.g. Silverman, 1986,
Scott, 1992 or Simonoff, 1996) by using kernel estimators but the problem is
complicated here by the fact that the range of (x, y) is bounded by the boundary
of the unknown Ψ. Simar and Wilson (1998, 2000b) propose procedures which
58 Statistical inference in nonparametric frontier estimation
are easy to apply. These procedures will exploit the radial nature of the Farrell-
Debreu efficiency scores and of the Shephard distance functions.
To take this radial nature into account, it is easier to transform the Cartesian
coordinates (x, y) into polar coordinates for the input vector x when input
efficiency scores are investigated as it is the case in our presentation (the output
oriented case would use polar coordinates for the output vector y).
The polar coordinates
for x are defined by its modulus ω = ω(x) ∈ R+
where ω(x) = (x x), and its angle η = η(x) ∈ [0, π2 ]p−1 , where for j =
xj+1
1, ..., p − 1, ηj = arctan( 1 ) if x1 > 0 or ηj = π2 if x1 = 0.
x
The density f (x, y) can be transformed, or represented by a density f (ω, η, y)
on the new coordinates and the latter joint density can be decomposed as:
where we suppose all the conditionals exist. So that for the frontier point x∂ (y)
on the ray defined by the input vector x has modulus for the output level y is
ω(x)
given by ω(x∂ (y)) = inf{ω ∈ R+ |f (ω|y, η) > 0}, and δ(x, y) = .
ω(x∂ (y))
We see by the latter expression that the transformation in polar coordinates
induces a conditional pdf for δ(x, y) given (y, η), namely f (δ|y, η), with sup-
port over [1, ∞). Hence, in a certain sense we have transformed the density
f (x, y) expressed in Cartesian coordinates into a density on “polar-type” coor-
dinates f (δ, η, y) = f (δ | η, y)f (η | y)f (y). Consequently, now, the DGP is
characterized by P = P(Ψ, f (δ, η, y)). The reader can see here the analogy
with the simple model (3.1) in Section 3.1, where u was the univariate random
inefficiency term; here this term is replaced by δ which has a conditional density
f (δ | η, y).
The idea of the smooth bootstrap is to use as DGP in the bootstrap world P =
P(Ψ, f(δ, η, y)), where f(δ, η, y) will be a smooth continuous density estimate
of the unknown density from the sample of observed values (δi , ηi , Yi ) obtained
by the polar transformation described above of the original data (Xi , Yi ) and
where the unknown δi , have been replaced by the estimates δi = δ(X i , Yi )
(which are the distance functions in the bootstrap world, i.e., with respect to the
attainable set Ψ).
Simar and Wilson (2000b) propose an algorithm to simulate pseudo-data
(δi , ηi , Yi ) and to transform them back in Cartesian coordinates to obtain the
doing so, we achieve consistency of the density estimate even near its boundary
points (see Simar and Wilson, 2000b for further details).
1 n δ − δ
f(δ) =
i
K , (3.22)
nh i=1 h
h = 1.06sn n−1/5 ,
where sn is the empirical standard deviation of the n values δi . This is known
as the normal reference rule. It has been shown by Silverman that the choice
where rn is the interquartile range of the n data points, is more robust to de-
partures form the gaussian assumption for f (·). This latter rule is referred as
the robust normal reference rule. It is very popular and give generally reason-
able values for the bandwidth. Other empirical rules have been proposed in
the literature, like the Sheather and Jones (1991) method which tries to be still
more robust to departures from the normal assumption, by using higher order
empirical moments of the data points.
As a matter of fact the problem is slightly more complicated here for two
reasons: (i) there is a spurious mass at one in the sample of values δi , i =
1, . . . , n, and (ii) there is a boundary effect since δ ≥ 1 by definition and the
estimate in (3.22) does not verify this constraint. As suggested in Simar and
Wilson (2006a), the first problem is solved by deleting the spurious ones, in this
step of bandwidth and density estimation, and the second problem is addressed
by using the reflection method (see Silverman, 1986 for details). Formally, we
consider only the m values of δi > 1, for i = 1, . . . , m with m < n, then we
consider the set of the 2m values {2 − δ1 , . . . , 2 − δm , δ1 , . . . , δm } which are
now symmetrically distributed around 1. Then we compute the kernel density
estimate with this series (without any boundary condition). Analog to (3.22),
we have:
1 m δ − δ δ − 2 + δ
i i
gh (δ) = K +K , (3.24)
2mhm i=1 hm hm
The optimal bandwidth by using the empirical rule (3.23) here is obtained by:
where now s2m and r2m are computed from the 2m reflected data. As pointed
above, the distribution of these reflected values is by construction symmetric
around 1 and we observed in many applications a bell-shape for the distribution
of these reflected values. Therefore the robust normal-reference rule giving hm
offers in many applications a reasonable value for the bandwidth not far from
the optimal one.
Bootstrap techniques and applications 61
where gh,(i) (δ) is the leave one-out estimator of g(δ) based on the 2m values
except δi . Another automatic data-driven technique based on likelihood cross-
validation is described in Section 5.3, where a variable bandwidth is obtained
in a different context by using a k-Nearest Neighbor approach.
Note that in many applications the solution of (3.25) is not very far from the
simple empirical rule (3.24). Note also that, as pointed in Simar and Wilson
(1998), the bootstrap results are relatively stable to small changes in the selected
bandwidth. Therefore, as a reasonable first guess for hm , we might suggest the
use of the easy rule (3.24).
Finally, as suggested by Simar and Wilson (2006a), the value of hm (whatever
being the rule used to obtain it) has to be adjusted for scale and sample size:
2m s
n
h = hm . (3.26)
n s2m
The density estimate is then obtained through:
2gh (δ) ifδ > 1,
f(δ) = (3.27)
0 otherwise.
How to generate the δi from f(δ) and build a bootstrap sample X ?
Simar and Wilson (1998, 2006a) provide an easy to implement algorithm
where it is shown that the density estimate f(δ) is even not needed to generate
the δi . We only need the selected value of h and the original DEA scores
{δi ; i = 1, . . . , n}.
The algorithm is going as follows:
[1] we first draw a random sample of size n with replacement (as in the
naive bootstrap) from the set of the 2n reflected original DEA scores
{2 − δ1 , . . . , 2 − δn , δ1 , . . . , δn }, obtaining {δ̃i ; i = 1, . . . , n}.
[2] Then we smooth the naive bootstrap resampled values by perturbating δ̃i
with a random noise generated from the kernel density with scale given
by the bandwidth h. So we obtain:
˜
δ̃ i = δ̃i + h εi , i = 1, . . . , n,
62 Statistical inference in nonparametric frontier estimation
The naive bootstrap is inconsistent because there is no reason why this proba-
bility should be equal to this fixed number, independently of any feature of the
real DGP P. In fact, if f (δ) is continuous on [1, ∞), the probability should be
zero since in this case:
y) − δ(x, y)|P = 0.
Prob δ(x,
17 SeeFerrier and Hirschberg, 1997; Simar and Wilson 1999a; Ferrier and Hirschberg, 1999; Simar and
Wilson 1999b.
64 Statistical inference in nonparametric frontier estimation
The support of the probability HXY (·, ·) is the production set Ψ and HXY (x, y)
can be interpreted as the probability for a unit operating at the level (x, y) to
be dominated. Daraio and Simar (2005a) point out that this function is a non-
standard distribution function, having a cumulative distribution form for X and
a survival form for Y . In the input oriented framework, this joint probability
can be decomposed as follows:
where FX|Y (x|y) is the conditional distribution function of X and SY (y) is the
survivor function of Y ; we suppose the conditional distribution and survival
functions exist (i.e., SY (y) > 0 and FX (x) > 0). The conditional distribution
FX|Y is non-standard due to the event describing the condition (i.e.,Y ≥ y
instead of Y = y, the latter is assumed in a standard regression framework).
We can now define the efficiency scores (in a radial sense) in terms of the
support of these probabilities. The input oriented efficiency score θ(x, y) for
(x, y) ∈ Ψ is defined for all y with SY (y) > 0 as:
θ(x, y) = inf{θ | FX|Y (θx|y) > 0} = inf{θ | HXY (θx, y) > 0}. (4.3)
The idea here is that the support of the conditional distribution FX|Y (· | y)
can be viewed as the attainable set of input values X for a unit working at the
output level y. It can be shown that under the free disposability assumption, the
lower boundary of this support (in a radial sense) provides the Farrell-efficient
frontier, or the input benchmarked value.
A nonparametric estimator is then easily obtained replacing the unknown
FX|Y (x | y) by its empirical version:
n
i=1 1I(Xi ≤ x, Yi ≥ y)
F
X|Y ,n (x | y) = n , (4.4)
=1 1I(Yi ≥ y)
A re-formulation based on the probability 67
where 1I(·) is the indicator function that has to be read as follows: 1I(k) = 1 if
k is true, 1I(k) = 0 otherwise.
The resulting estimator of the input efficiency score for a given point (x, y)
coincides with the FDH estimator of θ(x, y):
θF DH (x, y) = inf{θ | (θ x, y) ∈ Ψ
F DH } (4.5)
= inf{θ | FX|Y ,n (θx | y) > 0}. (4.6)
In the output oriented framework, the probability function HXY (·, ·) may be
decomposed as follows:
HXY (x, y) = Prob(Y ≥ y | X ≤ x) Prob(X ≤ x)
= SY |X (y|x) FX (x), (4.7)
where SY |X (y|x) = Prob(Y ≥ y | X ≤ x) denotes the conditional survivor
function of Y and FX (x) = Prob(X ≤ x) denotes the distribution function of
X that we assume exists, i.e. FX (x) > 0.
The output efficiency score may be defined accordingly:
λ(x, y) = sup{λ | SY |X (λy|x) > 0} = sup{λ | HXY (x, λy) > 0}. (4.8)
As for the input oriented case, a nonparametric estimator of λ(x, y) is ob-
tained by plugging in Equation (4.8) the empirical conditional survival function
SY |X,n (y|x) given by:
XY,n (x, y)
H
SY |X,n (y|x) = , (4.9)
XY,n (x, 0)
H
where,
1
n
HXY,n (x, y) = 1I(xi ≤ x, yi ≥ y). (4.10)
n i=1
Again, this estimator coincides with the FDH estimator of λ(x, y).
The FDH estimator Ψ F DH , as well as its convex version Ψ DEA , are very
sensitive to extremes and outliers, since, as estimators of the “full” set Ψ, they
envelop all the data points of the observed set X (this is seen by looking to
the inf and sup operator in (4.6) and (4.8)). The corresponding frontiers of
F DH and Ψ
Ψ DEA , can be viewed as estimators of the “full” frontier of Ψ. As
an alternative, partial frontiers can be investigated. They do not correspond
to the boundary of Ψ and are such that the full frontier can be viewed as a
limiting case of the partial frontiers. These frontiers correspond to another
benchmark frontier against which DMU will be compared. The advantage is
that their nonparametric estimators will not envelop all the data points and so
will be more robust to extreme and outlying data points. Two partial frontiers
have been investigated in the literature: the order-m frontiers and the order-α
quantile frontiers. They are introduced in the two next sections.
68 Nonparametric robust estimators: partial frontiers
The relations between φ and φm remain valid between their empirical counter-
parts: φn and φm,n . For all finite values of m we have: φm,n ≥ φn .
We remark that in the standard case, φn ≤ Xi , i = 1, ..., n but this is no
more the case in the order-m frontier estimator, φm,n , even for large values of
m. The reasons for this different behavior of φm,n with respect to φn are mainly
due to the expected operator in the definition of φm (see equation (4.11)) and
to the finiteness of m.
The extension at the bivariate case is straightforward: we consider the process
generating the input levels X by the conditional distribution of X given that
Y ≥ y. The full frontier function φ(y) is defined as the minimal achievable
input level for producing at least the output y. It may be written as:
Now, given a fixed integer value of m ≥ 1, we can define the (expected) order-
m lower boundary of X for DMUs producing at least y, as the expected value
of the minimum of m random variables X 1 , ..., X m drawn from the distribution
Order-m frontiers and efficiency scores 69
Again, for all values of y and for all finite values of m, φm (y) ≥ φ(y) and for
all y, lim (m→∞) φm (y) = φ(y).
See Figure 4.1 for an illustration.
0.8
FX|Y(x | Y > = y)
0.6
φ7(y)
0.4
0.2 φ(y)
0 *5 *
0 10* * * 15 * 20 * 25 30 35
values of input x
Figure
4.1. Input order-m frontier in the bivariate case. For any value of y, φm (y) =
E min(X 1 , . . . , X m ) | Y ≥ y . Here the stars X 1 , . . . , X 7 are m = 7 draws from
FX|Y (x|Y ≥ y) .
θ̃m (x, y) is a random variable since the Xi are random variables generated by
FX|Y (x | y).
The order-m input efficiency measure is defined, according to Daraio and
Simar (2005a), as follows:
Hence, in place of looking for the lower boundary of the support of FX|Y (x | y),
as was typically the case for the full-frontier and for the efficiency score θ(x, y),
the order-m efficiency score can be viewed as the expectation of the minimal
input efficiency score of the unit (x, y), when compared to m units randomly
drawn from the population of units producing more outputs than the level y.
This is certainly a less extreme benchmark for the unit (x, y) than the “absolute”
minimal achievable level of inputs: it is compared to a set of m peers producing
Order-m frontiers and efficiency scores 71
more than its level y and we take as benchmark, the expectation of the minimal
achievable input in place of the absolute minimal achievable input.
Order-m frontiers are estimators of the frontier, that for finite m, do not
envelop all the observed data points and therefore, are less sensitive to extreme
points and/or to outliers. As m increases and for fixed n, θ̂m,n (x, y) → θ̂n (x, y).
Daraio and Simar (2005b) define convex and local convex order-m frontiers
as well as a practical method to compute them.
Consider the firm (x, y); it produces a level of output y using a quantity x of
inputs. We recall that φm (y) is not the efficient frontier of the production set,
but it gives the expected minimum input among a fixed number of m potential
competing firms producing more than y. The comparison of x with φm (y) is
important, from an economic point of view, as it gives a clear indication of
how efficient the firm is, compared with these m potential firms. The value m
represents the number of potential firms (drawing from the population of firms)
producing at least the output level of y, against which we want to benchmark
the analyzed firm.
Let us give some examples of the economic meaning of the order-m input
efficiency measures. If a firm (x, y) has an efficiency score θm,n (x, y) = 0.9
(1.4), means that it uses 10% more inputs -radial extension- (uses 40% less
inputs - proportionate reduction) than the expected value of the minimum input
level of m other firms drawn from the population of firms producing a level of
output ≥ y. On the contrary, if θm,n (x, y) = 1, the firm (x, y) uses the same
level of inputs than the expected value of the minimum input level of m other
firms drawn from the population of firms producing at least y of output, i.e.
the firm is on the efficient boundary of the order-m frontier in the input space
direction.
72 Nonparametric robust estimators: partial frontiers
Computational aspects
For the computation of order-m efficiency θ̂m,n (x, y) the univariate integral
(4.23) could be evaluated by numerical methods18 , even when the number of
inputs p ≥ 1.
However, numerical integration can be avoided by an easy Monte-Carlo
algorithm, proposed by Cazals, Florens and Simar (2002), that we describe
below, as fast for small values of m such as m = 10, but much slower when m
increases:
[1] For a given y, draw a sample of size m with replacement among those Xi
such that Yi ≥ y and denote this sample by (X1,b . . . , Xm,b ).
X j
b (x, y) = min i,b
[2] Compute θ̃m i=1,...,m maxj=1,...,p xj
.
18 For
the numerical integration we use the build-in Matlab “quad” procedure (based on adaptive Simpson
quadrature).
Order-α quantile-type frontiers 73
of outputs. Here, for the same unit, the benchmark will be the order-α quantile
frontier defined as the input level not exceeded by (1 − α) × 100-percent of
firms among the population of units producing at least a level y of outputs.
Formally:
0.8
FX|Y(x | Y >=y)
0.6
0.4
φ0.90(y)
0.2 φ(y)
1 − α = 0.10
0
0
*5 * * * * 15 *
10 20
* 25 30 35
values of input x
As for the order-m frontier, this concept can be easily extended to the multiple
inputs case by defining the order-α input efficiency score for a unit operating
at the level (x, y), as follows:
has to reduce its input to the level θα (x, y)x to reach the input efficient frontier
of level α × 100%. Note that here θα (x, y) can be greater than one indicating
that a firm (x, y) can increase its input by a factor θα (x, y) to reach the same
frontier. Therefore, this latter firm is considered as super-efficient with respect
to the order-α frontier level.
Nonparametric estimators are easily obtained by plugging, as for the order-m
frontiers, the empirical cdf in the expression above, we have:
φα,n (y) = inf{x | FX|Y,n (x | y) > 1 − α}, (4.26)
and for the multivariate input case, we obtain:
θα,n (x, y) = inf{θ | FX|Y,n (θx | y) > 1 − α}. (4.27)
Here again it appears clearly that when α → 1, θα,n (x, y) converges to the
FDH input efficiency score θF DH (x, y).
The nonparametric estimators of order-α frontier or efficiency scores shares
√
the same properties than their order-m analogs. In summary, they are n-
consistent estimator of their population analogs, they are asymptotically unbi-
ased and normally distributed with a known expression for the variance (see
Aragon, Daouia and Thomas-Agnan (2005) for the properties of φα,n (y) and
Daouia and Simar, 2004 for those of θα,n (x, y)). Using tools of robustness
theory, it is shown in Daouia and Simar (2004) that the order-α frontiers are
more robust to extremes than the order-m frontiers.
where N denotes the set of all nonnegative integers. If x is univariate, φα,n (y) =
θα,n (x, y)x. Therefore, the computation of θα,n (x, y) is very fast and very easy
since it only implies sorting routines.
As discussed in Cazals, Florens and Simar (2002) and Aragon, Daouia and
Thomas-Agnan (2005), for the univariate input case, the partial frontiers φm (y)
and φα (y) are monotone function of y. However, the nonparametric estimators
φm,n (y) and φα,n (y) can be not monotone in y in finite samples. Daouia and
Simar (2005) propose an easy way to isotonize these estimators to achieve the
appropriate monotonicity. It is shown that these isotonized versions are even
more robust to extreme and outliers than the original nonparametric estimators.
As pointed by Daouia and Simar (2004), for every attainable point (x, y) ∈ Ψ,
there exists an α such that θα (x, y) = 1. This α could serve as an alternative
measure of input efficiency score. If FX|Y (x | y) is continuous in x, this
quantity is given (input orientation) by:
αinput (x, y) = 1 − FX|Y (x | y). (4.28)
In other words, one may set the estimated performance measure for a unit
operating at the level (x, y) to be the order α of the estimated quantile frontier
which passes through this unit. This new concept of efficiency, the α efficiency,
is illustrated in Figure 4.3. Suppose that we want to measure the efficiency score
of a unit located at the point A (this is a unit which produces a level of output y
using a level of input indicated by the point A on the x-axis). Its input efficiency
score is equal to αinput (A) = 0.30 since 70% of the units producing at least
the level y of output are using less input than unit A.
This idea has been first proposed by Aragon, Daouia and Thomas-Agnan
(2003) in the univariate case: they analyze the properties of these measures and
the properties of their nonparametric estimators. The multivariate extension
comes from Daouia and Simar (2004). These nonparametric estimators are ob-
tained by using the empirical counterparts of the distribution function. We have
to take into account the discreteness of empirical distributions. It can be shown
that the correct expression for the nonparametric estimator of αinput (xi , yi ), in
the input orientation, is given by:
1
input (xi , yi ) = 1 − FX|Y,n (xi |yi ) +
α . (4.29)
Myi
In Figure 4.4 we illustrate all the nonparametric and robust measures introduced
in the previous sections (input oriented framework). The illustration is presented
76 Nonparametric robust estimators: partial frontiers
0.6
0.4
0.2
φ(y)
0 *5 *
0 10* * * 15 * 20 * 25 30 35
Data point A values of input x
for α = 0.90 and m = 7. In this figure, φ(y) is the full frontier level, it is given
by the left boundary of the support of FX|Y (· | y), φα (y) corresponds to the
(1 − α) quantile of FX|Y (· | y) and φm (y) is the expectation of the minimum
of m virtual data points (here 7) generated by FX|Y (· | y). We represent by the
stars on the x-axis, 7 potential values of these random data points and we show
where φ7 (y) could be around. The measure αinput is the new probabilistic
efficiency measure defined above.
0.6
φ7(y)
0.4
φ0.90(y)
φ(y)
0.2
1− α = 0.10
0 *5 *
0 10* * * 15 * 20 * 25 30 35
Data point A values of input x
Figure 4.4. Illustration of full and partial frontiers in the input orientation. Here m = 7 and
α = 0.90: the solid curve is the conditional cdf FX|Y (x | y) = Prob(X ≤ x | Y ≥ y). The
stars on the x-axis represent 7 potential observations generated by FX|Y (· | y).
Properties of partial frontier estimators 77
where µN W,0 is a constant described in Park, Simar and Weiner (2000). Order-
m estimators share the same properties and the same is true for the output
oriented measures.
See Daraio and Simar (2005a), Daouia and Simar (2004, 2005) and Daouia,
Florens and Simar (2005) for illustrations of these robustness properties in
78 Nonparametric robust estimators: partial frontiers
from n (the number of analyzed firms, the sample size), the values of m might
be fixed by considering the possible number of potential competitors we want
“more realistically” benchmark our firm against. Furthermore, in most empir-
ical applications of order-m efficiency measures, we noted that for m ≥ 200
the order-m efficiency score is almost equal to the FDH efficiency score, i.e.
the asymptotic result limm→∞ θm,n (x, y) = θF DH,n (x, y) in practice, hap-
pens already for values such that m ≥ 200. Leading by these considerations,
we can define a grid of values for m to use in the sensitivity analysis, to build
simulated competitive scenarios, that can be particularly useful for the analysis
of industries and markets with an intensive and dynamic competition.
The economic meaning of order-α and α measures of efficiency is very inter-
esting and useful. If order-α measures of efficiency could be roughly considered
as a kind of “continuous version” of their order-m brothers, α measures have
also an immediate economic meaning. They are based on the idea that there ex-
ists for each firm in the comparison set a quantile frontier which passes through
it, on which the firm is efficient (either along the input dimension or along the
output dimension). If the quantile on which the firm is efficient (in the input
orientation) is 0.2, this means that there are 80% (1 − 0.2 = 0.8) of the firms in
the comparison set (firms producing at least the same level of outputs) which
outperform the considered firm by using less inputs. So that we can interpret
1 − α as a firm’s probability of being dominated on the input dimensions by
the other firms producing at least the same level of output. Accordingly, is the
interpretation for the output oriented case.
the partial frontier being detected by efficiency scores less than one, the grid
would be defined by 1 − τ ).
Thus for the order-m scores, the main steps of the computations are:
[1] Compute for each data point (Xi , Yi ), for i = 1, ..., n, its leave-one-out
input efficiency score i.e. its order-m input efficiency score leaving out
(i)
the observation (Xi , Yi ) from the reference set. Denote by θm,n (Xi , Yi )
the “leave-one-out” efficiency score and the corresponding reference set
by X (i) .
(i)
[2] Compute θm,n (Xi , Yi ), for i = 1, ..., n, for several reasonable value of
m, e.g. m = 10, 25, 50, 75, 100, 150.
[3] Compute also the number of points used to estimate the conditional distri-
bution function FX|Y (x | y ≥ Yi ), i.e. the number of points in X (i) with
y ≥ Yi . Denote this number as Ninput (Xi , Yi ). It is the number of points
used to estimate the p-variate distribution function. If Ninput (Xi , Yi ) is
small or even equal to zero, the correspondent point (Xi , Yi ) lies at the
border of the sample values X . Ninput (Xi , Yi ) thus indicates how the
point (Xi , Yi ) is near to the border of the support of data points.
[4] For each values in τ , plot the percentage of points in the sample X with
θm,n
(i)
(Xi , Yi ) ≥ 1 + τ (4.30)
as a function of m. This curve represents the percentage of points outside
the order-m frontier as function of m for all the threshold values defined
by the grid τ .
The computations for the output-oriented case are mutatis mutandis the same
as above.
The interpretation of the results is based on the following statement: any
point is outlying the cloud of points of data set X in the input direction (output
direction) when its order-m input (output) efficiency score is greater (smaller)
than one. The data points with input order-m efficiency measure greater than
one, even if m increases, or with small values of Ninput should be flagged as
being extremes. In the output oriented case, the points with order-m efficiency
score smaller than one, even if m increases, or with small values of Noutput
should also be flagged as extremes. When data points are detected as extremes
in both directions (input and output), they are warned as potential outliers.
For doing this, we need to choose at which level we consider that m is
“large”. This may be achieved by looking at the plots obtained above, showing
the percentage of points outside order-m frontiers for the different threshold
values τ . By construction, these curves should decrease when m increases,
and if there are no outliers, they should converge approximately linearly to the
Summary of the results for the output oriented case 81
percentage of points having a leave-one-out FDH score greater than one (smaller
than one for the output-oriented case). As a consequence, any strong deviation
from linearity should indicate the potential existence of outliers: if the curves
show an elbow effect (sharp negative slope, followed by a smooth decreasing
slope) they indicate that the points remaining outside the order-m frontier for
this value of m have to be further analyzed by the procedure described above
and eventually warned as potential outliers.
Beside this, we have also to select m such that a reasonable percentage of
points remains outside the frontier.
√
It has been suggested (Barnett and Lewis,
1995) to use the rule of thumb nn as a reasonable upper bound for the percentage
of outliers in a sample of size n.
Of course, once the potential outliers have been identified, they have to be
carefully analyzed to understand “why” they are outliers. Very often outliers
(when not due to errors) contains useful information on the process under analy-
sis (missing variables in the model, etc. . . ).
From this, it is easily seen that limm→∞ λm (x, y) = λ(x, y). A nonparametric
estimator of λm (x, y) is given by:
∞
m,n (x, y) =
λ 1 − (1 − SY |X,n (uy | x))m du
0
λn (x,y)
n (x, y) −
= λ (1 − SY |X,n (uy | x))m du. (4.38)
0
Let us give an example of the economic meaning of the order-m output effi-
ciency measure λ m,n (x, y) by looking at a firm operating at the level (x, y)
and such that λm,n (x, y) = 1.8. This firm produces a level of output -in radial
extension- that is equal to 0.56, i.e. 1/1.8, times the expected value of the
maximum level of output of m other firms drawn from the population of firms
using a level of inputs ≤ x. A value of λm,n (x, y) = 0.5 would indicate that
the firm produces 2 = 1/0.5 times more output than the expected value of the
maximum level of output of m other firms drawn from the same population.
As for the input oriented case, the computation of the order-m efficiency
m,n (x, y) may be done either by numerical integration, calculating the uni-
λ
variate integral in (4.38), or adapting the Monte Carlo algorithm presented in
Section 4.2 (for the input oriented case) as follows:
Summary of the results for the output oriented case 83
[1] For a given x, draw a sample of size m with replacement among those Yi
such that Xi ≤ x and denote this sample by (Y1,b , . . . , Ym,b ).
Y j
i,b
[2] Compute λ̃bm (x, y) = maxi=1,...,m minj=1,...,q yj
.
It follows,
α,n (x, y) =
Y(αN
x
x)
if αNx ∈ N ∗
λ (4.43)
Y([αN
x
x ]+1)
otherwise,
where N ∗ denotes the set of positive integers and [αNx ] denotes the integral
part of αNx .
Figure 4.5 also illustrates the main efficiency measures introduced in this section
for the output oriented framework. Here α = 0.90, m = 7, and ψ(x) is the full
frontier level, it is given by the right boundary of the support of SY |X (· | x).
ψα (x) corresponds to the (1 − α) quantile of SY |X (· | x) and ψm (y) is the
expectation of the maximum of m virtual data points generated by SY |X (· | x).
The stars on the x-axis are 7 potential values of these random data points and
ψ7 (y) shows where ψm (y) could be around.
0.2
1− α
0 * * * * **
0 2 4 6 8 10 12 14 16
Data point A values of output y
Figure 4.5. Illustration of full and partial frontiers in the output orientation. Here m = 7 and
α = 0.90: the solid curve is the conditional survival function SY |X (· | x) = Prob(Y ≥ y |
X ≤ x). The stars on the x-axis represent 7 potential observations generated by SY |X (· | x).
drawbacks that will be briefly described below. Florens and Simar (2005) pro-
pose a new method which overcomes most of these drawbacks by providing the
best parametric approximation of a frontier which is non parametrically esti-
mated in a first stage. Using the robust version of the nonparametric estimators
in this first step (order-m as in Florens and Simar, 2005, or order-α as in Daouia,
Florens and Simar, 2005), we obtain at the end√estimators of the parameters of
the models sharing nice statistical properties ( n-consistency and asymptotic
normality). So the inference on the parameters, or functions of the parameters
is very easy.
In this section, we will present the ideas in the output orientation where the
output y is univariate y ∈ R+ and the input x ∈ Rp+ , hence we are interested
in a production function. The same could be done for an input function where
x ∈ R+ and y ∈ Rq+ , by using input orientation. In Section 4.7, we will see
how these concepts can be adapted to a full multivariate setup with multiple
inputs and outputs.
Consider a suitable parametric family of production function defined on Rp+ :
Cobb-Douglas or Translog models are often used for this parametric family.
The parametric model for the frontier can be written as:
y = ϕ(x; θ) − u,
the cloud of points, whereas the frontier (and its characteristics) are properties
of the boundary of the observed cloud of points. Besides, in addition to the
independence assumption between u and x, these methods require specific
distributional parametric assumption for the efficiency term u (at least for COLS
and MLE), where in general, very few is known a priori on the shape of this
distribution. Finally, as most of the deterministic models, the procedure is very
sensitive to extremes or outliers.
Of course if the parametric model is true, then θ0 is the true value of the para-
meter.
We can similarly define the pseudo-true values for the partial frontiers ϕm (·)
or ϕα (·), as follows:
n
2
θ0m = arg min (ϕm (xi ) − ϕ(xi ; θ)) , (4.47)
θ
i=1
n
2
θ0α = arg min (ϕα (xi ) − ϕ(xi ; θ)) . (4.48)
θ
i=1
leads to:
n
2
θ̂n = arg min (ϕF DH,n (xi ) − ϕ(xi ; θ)) , (4.49)
θ
i=1
n
2
θ̂nm = arg min (ϕm,n (xi ) − ϕ(xi ; θ)) , (4.50)
θ
i=1
n
2
θ̂nα = arg min (ϕα,n (xi ) − ϕ(xi ; θ)) . (4.51)
θ
i=1
The statistical properties of these estimators are derived in Florens and Simar
(2005) and in Daouia, Florens and Simar (2005). They can be summarized as
follows:
θ̂n − θ0 = op (1)
√ m approx.
n(θ̂n − θ0m ) ∼ Nk (0, Vm )
√ α approx.
n(θ̂n − θ0α ) ∼ Nk (0, Vα )
where an explicit expression is obtained for Vm and Vα . In practice however,
it is simpler to use a bootstrap algorithm, that we describe below19 , to provide
consistent estimators of these variances. Note that for the full frontier parame-
ters we only have consistency and not the asymptotic normality. This is another
argument to favor the use of partial frontiers in the first step calculations. In
addition, as explained below, by choosing m or α large enough, we estimate
also the full frontier itself.
Note also that here no particular assumption is made on the error term when
fitting the parametric model: we do not need a particular parametric distribution,
we do not require homoscedasticity, the error term can be related to the level
of the inputs x. Hence, clearly, most of the drawbacks of the regression-type
estimators are overcome with this two-stage approach.
We have seen above that ϕm(n),n and ϕα(n),n are robust estimator of ϕ, the
full frontier itself, if m(n) → ∞ and α(n) → 1 when n → ∞. Daouia, Florens
and Simar (2005) prove that if m(n) and α(n) are such that:
lim m(n) = ∞, lim m(n)(log n/n)1/2 = 0, lim n(1 − α(n)) = 0.
n→∞ n→∞ n→∞
19 See also the appendix of Florens and Simar (2005) for additional details.
Parametric approximations of robust nonparametric frontiers 89
[1] Draw a random sample of size n with replacement from X = {(xi , yi )|i =
1, . . . , n} to obtain the bootstrap sample Xb∗ = {(x∗i,b , yi,b
∗ )|i = 1, . . . , n}.
√
[2] With this sample Xb∗ , compute n(θb,n
α,∗ α,∗
− θ0,b ) where:
n 2
1
α,∗
θ0,b = arg min ϕα,n (x∗i,b ) − ϕ(x∗i,b ; θ) , (4.52)
θ n
i=1
n 2
1
θb,n
α,∗
= arg min ϕ∗α,n (x∗i,b ) − ϕ(x∗i,b ; θ) , (4.53)
θ n i=1
estimators20 are evaluated in (4.52) and (4.53) at the bootstrap values x∗i,b ,
i = 1, . . . , n.
From its definition and the discussion in Section 2.3, it is easily seen that
the distance function shares the following properties: (i) for all (x, y) ∈ Ψ,
δ(x, y) ≤ 1; (ii) δ(x, y) = 1, if and only if (x, y) is on the efficient boundary
of Ψ; (iii) δ(x, y) is homogeneous of degree one in y: δ(x, ηy) = ηδ(x, y) for
all η > 0. Of course, the parametric models proposed for approximating these
distances should be constrained to satisfy their properties.
20 In the common case where ϕ (x; θ) is linear in θ, the solutions of (4.52) and (4.53) are obtained by simple
α
OLS techniques. Otherwise nonlinear least squares methods have to be used.
Multivariate parametric approximations 91
Also order-m output distance function , δm (x, y), and order-α output dis-
tance function δα (x, y) can be considered by taking the inverse of the corre-
sponding λ measures defined in the preceding sections. These partial frontiers
have to be preferred if we want to be more robust to extreme data points.
Consider now a parametric family of functions defined on Rp+ ×Rq+ , denoted
by {ϕ(·, ·; θ) | θ ∈ Θ ⊂ Rk }, such that:
The same procedure can be followed with partial frontiers to get more robust
estimators of θ0 . This is achieved by using δ̂m or δ̂α in place of δ̂F DH in
Equation (4.55).
By Florens-Simar (2005) and Daouia-Florens-Simar (2005) the resulting
estimators share the same statistical properties as in the√preceding section,
namely consistency for the full frontier approximation, n-consistency and
asymptotic normality for the partial frontiers parameters.
In the following subsections we present two examples of parametric mod-
els for ln δ(x, y) based on the Generalized Cobb-Douglas and the Translog
functions.
where ỹi2 = yi2 /yi1 and β1 = 1 − β2 iq−1 . Finally the problem for defining the
pseudo-true value can be written as:
n
2
θ0 = arg min − ln yi∗,1
− [α0 + α ln xi + β2 ln ỹi2 ] , (4.57)
α0 ,α,β2
i=1
i
and then β̂1,n = 1 − β̂2,n m α
q−1 . The same could be done for θ̂n and for θ̂n .
Let’s now compare this approach with the classical approach by Grosskopf,
Hayes, Taylor and Weber (1997) and Coelli (2000), where the following model
is estimated by COLS, MOLS or MLE:
− ln yi1 = α0 + α ln xi + β2 ln ỹi2 + ui ,
where ui > 0 is considered as the inefficiency term. The difference comes from
the left-hand side term of the equation and from the stochastics on the efficiency
Multivariate parametric approximations 93
β iq = 1, 1 constraint
Γ12 iq = 0, p constraints
Γ22 iq = 0, q constraints
Γ11 Γ12
where Γ = .
Γ21 Γ22
Here the pseudo true values of θ are defined through the equation:
n
θ0 = arg min − ln yi∗,1 − [α0 + α ln xi + β̃2 ln ỹi2
α0 ,α,β2 ,Γ11 ,
Γ12 ,
Γ22 i=1
2
1 12 ln ỹ 2 + 1 ln ỹ ,2 Γ
22 ln ỹ ,2 ]
+ ln xi Γ11 ln xi + ln xi Γ i i i .
2 2
!
c11 c2
where: β = (β1 β̃2 ) , 12
Γ12 = a Γ and Γ22 =
c2 Γ 22
or by yi1 /δ̂m,n (xi , yi ) or by yi1 /δ̂α,n (xi , yi ) if more robust estimators are desired.
The estimator for (β1 , a, c) will be recovered from the homogeneity con-
straints:
Finally, at the
end of the procedure we have now the full parameter estimates:
. By applying the results of Florens and Simar (2005) and of
α̂0 , α̂, β̂, Γ
Daouia, Florens and Simar (2005), we have all the desired statistical inference
based on the appropriate normal asymptotic distribution. Here too, the bootstrap
is used in practice to make inference on the parameters θ for the partial frontiers
approximations (order-α or order-m). All the details of the algorithm have been
presented in Section 4.6.2 above. They are easily adaptable to this multivariate
setup. Again the bootstrap can be used either to estimate the variance Vm or
Vα of the estimators in the asymptotic normal approximation, or to provide
directly percentile confidence intervals. The method is illustrated on real data
in Section 7.5.
Chapter 5
This chapter deals with the important topic of the introduction of external-
environmental variables in frontier models. It is useful to explain why these
factors are important for comparative efficiency analysis and it shows the po-
tential of some recently introduced diagnostic tools for capturing their impact
on the performance of the analysed firms.
The evaluation of the influence of external-environmental factors on the effi-
ciency of producers is a relevant issue related to the explanations of efficiency,
the identification of economic conditions that create inefficiency, and finally to
the improvement of managerial performance.
The meaning and the economic role played by external-environmental vari-
ables are strictly linked to the economic field firms are operating in. The choice
of the environmental variables has to be done on a case-by-case basis, having
a good knowledge of the production process characteristics and by taking into
account the economic field of application.
From an economic point of view, we are interested in the evaluation of the
influence of Z variables on the performance of the firms. To be able to evaluate
this influence we have, firstly, to introduce the variables in the frontier estimation
problem and then we have to address some questions, like the following: “Is
the production process (and then the efficiency scores of firms) affected by the
Z variables?”; if the answer to this question is yes, “How we can evaluate their
influence?”.
The aim of this chapter is therefore to present “how” environmental variables
can be introduced in the probabilistic formulation of efficient frontier estimation
(described in the previous chapter) and to propose “a way” to operationalize
their introduction. The first purpose (introduction of Z variables) is addressed
in a following section which presents a full set of conditional (to Z) measures
of efficiency in a full frontiers (DEA/FDH) setting and in a robust (or partial)
96 Conditional measures of efficiency
frontiers (order-m and order-α) setting. The second purpose (measuring the
influence of Z) is handled through the introduction of an econometric method-
ology to follow in practical applications, based on the evaluation of the global
influence of Z on the production process, and on a decomposition of the condi-
tional (full or partial) efficiency of firm into some indicators with an interesting
economic meaning. In particular, we propose to decompose the conditional
efficiency of a firm into the unconditional (full or robust) efficiency score, an
externality index - that measures the environmental conditions in which the
firm operates in, i.e. favorable vs. unfavorable- and a producer intensity in-
dex, that measures the individual level of exploitation of the “environmental
conditions”, i.e. opportunities vs. threats of the “environment”. This decom-
position is particularly useful to facilitate comparative economic analysis of
DMU’ performance as it is shown in the applications reported in Part II of this
book.
Here we complete the approach proposed by Daraio and Simar (2005a,
2005b) for full and order-m frontiers (extending previous results by Cazals,
Florens and Simar, 2002), applied to order-α frontiers by Daouia and Simar
(2004). In particular we discuss at length and explain in details how to evaluate
the impact of multivariate Z on full and robust productive efficiency of DMU.
Furthermore, a new conditional probabilistic efficiency measure is introduced
in Section 5.2.3.
After a brief overview on the relevant literature on the introduction of external-
environmental variables in nonparametric frontier models reported in the next
section, Section 5.2.3 introduces a complete set of “conditional” measures of
efficiency, i.e. efficiency measures which take into account these external-
environmental variables, and presents their computational aspects. This presen-
tation will let the applied economist to catch the basic functioning mechanism
of the techniques. After that, Section 5.3 describes how to select the bandwidth
to compute the conditional estimators for both univariate and multivariate Z.
Afterwards, Section 5.4 explains in details and with simple illustrations the
econometric methodology useful to interpret the plots and decompose the con-
ditional efficiency measures. Finally, a series of simulation exercises illustrates
the usefulness of the methodology and how to practically implement the pro-
posed approach.
21 For
a detailed discussion on the interplay between environmental conditions and internal conditions see
Morroni (2006), p. 31 ff.
98 Conditional measures of efficiency
behind patterns” (as Bartelsman and Doms, 2000 call them) has started to be
studied in literature only during the last decades22 .
An explicit reference to the importance of environmental variables can be
found in Lewin and Minton (1986) which define a research agenda for deter-
mining organizational effectiveness, whose main points are: 1) to be capable of
analytically identifying relatively most effective organizations in comparison
to relatively least effective organizations; 2) to be capable of deriving a single
summary measure of relative effectiveness of organizations in terms of their
utilization of resources and their environmental factors to produce desired out-
comes; 3) to be able to handle noncommensurate, conflicting multiple outcome
measures, multiple resource factors and multiple environmental factors outside
the control of the organization being evaluated; and not be dependent on a set
of a priori weights or prices for the resources utilized, environmental factors or
outcome measures; 4) to be able to handle qualitative factors such as participant
satisfaction, extent of information processing available, degree of competition,
etc.; 5) to be able to provide insight as to factors which contribute to relative
effectiveness ratings; 6) to be able to maintain equity in the evaluation.
As we have seen above, the exploration of the reasons for productivity/effi-
ciency differentials across production units is a relevant issue. When these
external factors Z ∈ Rr are continuous mainly two approaches have been
proposed in literature but both are flawed by restrictive prior assumptions on
the DGP and/or on the role of these external factors on the production process.
The first family of models is based on a one-stage approach (see e.g. Banker
and Morey, 1986a; Banker and Morey, 1986b for categorical external factors;
Färe, Grosskopf, Lovell and Pasurka, 1989; Färe, Grosskopf and Lovell, 1994,
p. 223-226), where these factors Z are considered as free disposal inputs and/or
outputs which contribute to define the attainable set Ψ ⊂ Rp+ × Rq+ × Rr , but
which are not active in the optimization process defining the efficiency scores.
In this case, the efficiency scores conditional to the Z are:
θ(x, y|z) = inf{θ | (θx, y, z) ∈ Ψ}, (5.1)
and the estimator of Ψ is defined as above by adding the variables Z in defining
the FDH and /or the DEA enveloping set. Here the variable Z is considered
as an input if it is conducive (favorable, advantageous, beneficial) to efficiency
and as an output if it is detrimental (damaging, unfavorable) to efficiency. The
drawback of this approach is twofold: first we have to know a priori what is
the role of Z on the production process, and second, we have to assume the free
disposability (and eventually convexity, if DEA is used) of the corresponding
extended attainable set Ψ.
22 For
the introduction of external-environmental variables in parametric frontier models, see Kumbhakar
and Lovell (2000).
Explaining efficiency in the literature 99
Ψ × Rr and hence, the value of Z does not influence neither the attainable
set Ψ nor the position of the frontier of the attainable set: Z acts only on the
stochastic process pushing firms far from the frontier.
Second, the regression in the second stage relies on some parametric as-
sumptions (like linear model and truncated normal error term in most studies.
Recently, Park, Simar and Zelenyuk, 2006, have proposed a nonparametric ap-
proach using maximum likelihood techniques). In the next section we see how
to avoid these limitations.
Ψz = {(x , y) ∈ Rp+q
+ |x ≥x
∂,z
(y) for (x, y) ∈ Ψ}, (5.4)
where x∂,z (y) is the efficient level of input, conditional on Z = z, for an output
level y: x∂,z (y) = θ(x, y | z) x, where (x, y) ∈ Ψ. Clearly, Ψz ⊆ Ψ.
where K(·) is the kernel and h is the bandwidth of appropriate size23 . Hence,
we obtain the “conditional FDH efficiency measure” as follows:
θF DH (x, y | z) = inf{θ | FX|Y,Z,n (θx | y, z) > 0}. (5.6)
Daraio and Simar (2005a) pointed out that for kernels with unbounded support,
like the gaussian kernel, it is easy to show that θF DH (x, y|z) ≡ θF DH (x, y):
the estimate of the full-frontier efficiency is unable to detect any influence of the
environmental factors. Therefore, in this framework of conditional boundary
estimation, kernels with compact support have to be used.
For any (symmetric) kernel with compact support (i.e., K(u) = 0 if |u| > 1,
as for the uniform, triangle, Epanechnikov or quartic kernels, see e.g. Silver-
man, 1986), the conditional FDH efficiency estimator is given by:
θF DH (x, y|z) = inf{θ | FX|Y,Z,n (θx | y, z) > 0} (5.7)
Xj
i
= min max . (5.8)
{i|Yi ≥y,|Zi −z|≤h} j=1,...,p xj
In this framework, the estimation of conditional full frontiers does not depend
on the chosen kernel but only on the selected bandwidth. This will be different
for the conditional order-m and order-α measures defined below.
The conditional attainable set Ψz is estimated by:
z p+q F DH }, (5.9)
F DH = {(x , y) ∈ R+ | x ≥ x̂ (y) for (x, y) ∈ Ψ
∂,F DH,z
Ψ
where x̂∂,F DH,z (y) is the estimated conditional efficient level of inputs:
Daraio and Simar (2005b) define also a conditional DEA estimator and ad-
dress its computational issue. Consistency and asymptotic properties of these
conditional estimators are investigated in Jeong, Park and Simar (2006).
23 Issues about the practical choice of the bandwidth are discussed in Section 5.3.
102 Conditional measures of efficiency
Note that this set depends on the value of z since the Xi are generated through
the conditional distribution function. For any x ∈ Rp+ , the conditional order-
m input efficiency measure given that Z = z, denoted by θm (x, y|z) is then
defined as:
z
θm (x, y|z) = EX|Y,Z (θ̃m (x, y) | Y ≥ y, Z = z), (5.11)
where θ̃mz (x, y) = inf{θ | (θx, y) ∈ Ψ z (y)} and the expectation is relative
m
to the distribution FX|Y,Z (· | y, z). It is shown by Daraio and Simar (2005a)
(Theorem 3.1) that θm (x, y|z) converges to θ(x, y|z) when m → ∞.
A nonparametric estimator of θm (x, y|z) is provided by plugging the non-
parametric estimator of FX|Y,Z (x|y, z) proposed in (5.5), which depends on the
kernel and on the chosen bandwidth. Formally, the estimator can be obtained
by:
θm (x, y|z) = E
X|Y,Z (θ̃m (x, y) | y, z)
z
(5.12)
∞
= (1 − FX|Y,Z,n (ux | y, z))m du. (5.13)
0
Computational aspects
As we described in Section 4.2 the computation of order-m efficiency θ̂m,n
(x, y) could be done either by evaluating the univariate integral (4.23) via nu-
merical methods, or by an easy Monte-Carlo algorithm.
Similarly, the conditional order-m efficiency θ̂m,n (x, y | z) can be computed
either evaluating the integral (5.13) by numerical methods or using an adapted
version of the Monte-Carlo algorithm recalled above. This Monte-Carlo algo-
rithm for the conditional input order-m efficiency works as follows. Suppose
that h is the chosen bandwidth for a particular kernel K(·):
Introducing external-environmental variables 103
1
My
FX|Y,Z,n (θx|y, z) = y
1I(X(j) y
≤ θ)K (z − Z[j] )/hn
Qy,z j=1
104 Conditional measures of efficiency
y
if θ < X(1)
0
y y
= k if X(k) ≤ θ < X(k+1) , k = 1, · · · , My − 1
y
1 if θ ≥ X(M y)
,
y
where, for j = 1, · · · , My , Z[j] denotes the observation Zi corresponding to
y n
the order statistic X(j) , Qy,z = 1I(Yi ≥ y)K( z−Z i
hn ), and finally k =
i=1
y
(1/Qy,z ) kj=1 K (z − Z[j] )/hn .
Thus the nonparametric estimator of the conditional order-α input efficiency
measure given that Z = z is given by:
y
X(1)
if 0 ≤ 1 − α < 1
θα,n (x, y|z) = y
X(k+1) if k ≤ 1 − α < k+1 , (5.15)
k = 1, · · · , My − 1.
Again this is easy and very fast to compute since it is only based on enumerative
algorithms. Daouia and Simar (2004) have proven the consistency of these
conditional estimators.
In other words, one may set the estimated conditional performance measure
for a unit operating at the level (x, y) facing the environmental conditions z,
to be the order αz of the estimated conditional quantile frontier which passes
through this unit. This new measure of conditional efficiency, the conditional
α efficiency, may be estimated by the following quantity:
input
z,n
α = 1 − k−1 , (5.17)
y
where k is the index such that X(k) = 1, k was defined above and we set
0 = 0.
This new measure has an appealing economic interpretation. The quantity
(1−αzinput (x, y)) is the firm (x, y)’s probability of being dominated in the inputs
space, given its level of outputs, taking into account its external-environmental
conditions Z = z.
Hence, a simple indicator of the impact of external factors on firms perfor-
αinput
z (x,y)
mance may be αQz = αinput (x,y)
, where αinput (x, y), defined in (4.28), is the
Introducing external-environmental variables 105
– If αQz > 1, then (1 − αzinput (x, y)) < (1 − αinput (x, y)) In this case, for
the firm operating at the level (x, y) the probability of being dominated
given the condition Z = z is lower than that of being dominated without
taking into account the external conditions.
– If αQz = 1, then (1 − αzinput (x, y)) = (1 − αinput (x, y)) Here, the
probabilities are equal, hence it seems that the external conditions do not
play any role.
– If αQz < 1, then (1 − αzinput (x, y)) > (1 − αinput (x, y)) In this situation,
the probability of being dominated given the condition Z = z is higher
than that of being dominated without taking into account the external
conditions.
An application of this new measure and related indicators on real data is reported
in Chapter 8, where their usefulness is also shown.
In Table 5.1 we report all the nonparametric and robust measures of efficiency
introduced in this book as well as the main related references. The presentation
in the table is done for the input oriented framework. However, most of the cited
references report or just outline the output oriented correspondent measures that
will be described in the following section.
n
i=1 1I(Xi ≤ x, Yi ≥ y)
S
Y |X,n (y | x) = n .
i=1 1I(Xi ≤ x)
Then, the FDH estimator of the output efficiency score for a given point (x, y)
can be written as λ̂n (x, y) = sup{λ | SY |X,n (λy | x) > 0}. Mutatis mutandis,
all the output oriented measures share the same properties as their input oriented
correspondent.
106 Conditional measures of efficiency
Table 5.1. A summary of nonparametric and robust efficiency measures presented in this book
with the most important references. Input orientation.
Unconditional Conditional
Measures Measures
ninput
α input
z,n
α
Aragon et. al. (2003) this book
Daouia and Simar (2004)
and this book
Conditional full-frontier
Note that this random set depends on the value of z since the Yi are generated
through SY |X,Z (y | x, z). Then, for any y, we may define λ̃zm (x, y) = sup{λ |
(x, λy) ∈ Ψzm (x)}. The conditional order-m output efficiency measure is
defined as:
where again it can be shown that limm→∞ λm (x, y|z) = λ(x, y|z). A non-
parametric estimator of λm (x, y|z) is given by:
∞
m (x, y|z) =
λ 1 − (1 − SY |X,Z,n (uy | x, z))m du
0
λn (x,y|z)
n (x, y|z) −
=λ (1 − SY |X,Z,n (uy | x, z))m du. (5.22)
0
The Monte Carlo algorithm presented in Section 5.2.2 can be easily adapted
to the output orientation.
Similarly, we can define the conditional order-α output efficiency measure given
that Z = z as:
Here also we have limα→1 λ α,n (x, y|z) = λ n (x, y|z). The computation of
the estimator can be described as follows, by using the notations introduced in
Chapter 4. For j = 1, · · · , Nx , denote by Z[j]
x the observation Z corresponding
i
n
to the order statistic Y(j)
x , and let R
x,z = i=1 1I(Xi ≤ x)K( z−Z i
hn ) > 0. Then
it can be shown that:
1 Nx
SY |X,Z,n (λy|x, z) = 1I(λ ≤ Y(j)
x
)K (z − Z[j]
x
)/hn
Rx,z j=1
1 λ ≤ Y(1)
if x
= Lk+1 if Y(k)
x < λ ≤ Yx
(k+1) , k = 1, · · · , Nx − 1
0 if λ > Y(N
x
x)
,
Nx
j=k+1 K (z − Z[j] )/hn . The estimator is then
where Lk+1 = (1/Rx,z ) x
computed as follows:
Y(k)
x if Lk+1 ≤ 1 − α < Lk , k = 1, · · · , Nx − 1
α,n (x, y|z) =
λ (5.24)
Y(Nx ) if 0 ≤ 1 − α < LNx .
x
Here, for every attainable point (x, y) ∈ Ψ, there exists an αz such that
λαz (x, y|z) = 1, this αz could serve as an alternative measure of conditional
output efficiency score. If SY |X,Z (y | x, z) is continuous in y, this quantity is
given, as for the input orientation, by:
αzoutput (x, y) = 1 − SY |X,Z (y | x, z). (5.25)
It may be estimated by the following quantity:
output
z,n
α = 1 − Lk+1 , (5.26)
where k is the index such that Y(k)
x = 1, L
k+1 was defined above and we set
LNx +1 = 0.
that the choice of the bandwidth may be crucial. We have already discussed
in Section 3.4 the bandwidth selection issue for bootstrapping DEA efficiency
scores, where we reported also some simple rules of thumb which can be easily
put in place and seem work pretty well. Here, for the computation of conditional
measures of efficiency we propose a very simple and easy to compute rule
based on a k-Nearest Neighbor (k-NN) method to select the bandwidth. We
will present the ideas in the simplest case where Z is univariate and where a
family of continuous kernels with compact support is available (like, triangular,
quartic or Epanechnikov kernels) and then we will adapt the presentation for
multivariate Z with an easy to implement kernel with compact support based
on truncated multivariate normal kernels.
Since there are no reasons to motivate the choice of which observation to leave
out, the log likelihood is averaged over each choice of omitted Xi , to give the
following score function:
n
(−i)
CV (h) = n−1 log fˆh (Xi ) .
i=1
110 Conditional measures of efficiency
and hZi is the local bandwidth chosen such that there exist k points Zj verifying
|Zj − Zi | ≤ hZi .
Afterwards, in a second step, in order to compute FX|Y,Z,n (x | y, z) (and
SY |X,Z,n (y | x, z) for the output-oriented case), we have to take into account for
the dimensionality of x and y, and the sparsity of points in larger dimensional
spaces. Consequently, we expand the local bandwidth hZi by a factor 1 +
n−1/(p+q) , increasing with (p + q) but decreasing with n.
The issue of choosing an optimal bandwidth in this setup is still an open
research question but the empirical method that we propose here turns out to
provide very sensible results, as shown by our simulated examples (see below)
and in most of our applications with real data (see Part II of this book). See also
the comments at the end of the next section devoted to the multivariate case.
The problem is that each component of Z has its own dispersion and so the
bandwidths should be scaled accordingly for each component. More gener-
ally, if one wants to estimate a r-dimensional density by kernel smoothing we
must
" choose a kernel function K(u) where u ∈ Rr , such that K(u) ≥ 0 and
Rr K(u) du = 1. Then we have to select a bandwidth matrix H which has to
be a (r × r) positive definite matrix. The scaled kernel function can then be
written as KH (u) = |H|−1 K(H −1 u) where |H| stands for the determinant of
the matrix H. Then a density estimate for Z could be written as:
1
n
1 n
f (z) = KH (Zi − z) = K(H −1 (Zi − z)).
n i=1
n|H| i=1
1
n $
r
Zij − z j
f(z) = r
#r K( ),
nh j=1 sj i=1 j=1 hsj
1 n
1
f(z) = 1/2
exp{− 2 (Zi − z) S −1 (Zi − z)}.
n(2π) h |S|
r/2 r
i=1
2h
multivariate normal kernel approach. The idea is very simple, we truncate the
basic gaussian kernel K(u) on a sphere of radius one and in order to obtain a
continuous kernel at the boundary (which is preferable when estimating con-
tinuous densities), we rescale it so that the truncated kernel is equal to zero on
the boundary sphere defined by u u = 1. After some analytical manipulations,
this leads to a new basic kernel bounded on the sphere of radius one defined as:
exp{−u u/2} − exp{−1/2}
K ∗ (u) = r/2
1I(u u ≤ 1),
C − exp{−1/2} Γ(1+r/2)
π
∗ 1
KH (u) = r/2
hr |S|1/2 (C − exp{−1/2} Γ(1+r/2)
π
)
1 −1
×( exp{− u S u} − exp{−1/2})1I(u S −1 u ≤ h2 ). (5.27)
2h2
This is a truncated normal distribution, truncated at the ellipsoid u S −1 u ≤ h2
of “radius” h, the density being scaled so that it is continuous (equal to zero)
on its boundary. Finally, the expression for the density estimate of Z may be
written as above as:
1 n
f(z) = K ∗ (Zi − z). (5.28)
n i=1 H
Here again only one bandwidth has to be selected, we will use the k-nearest
neighbor principle: we select a local bandwidth hz such that the ellipsoid
centered in z with shape matrix S −1 and “radius” hz contains exactly k data
points Zi . An optimal value for k is then obtained as explained above by
likelihood cross-validation. Of course, as for the univariate case, once hi is
determined, we correct it in a second step to expand the local bandwidth hi by
a factor 1 + n−1/(p+q) .
In all our applications below, it is the method we have used even for the
particular case of r = 1. The method provided very sensible results and nice
estimators of the density of Z (uni- and multivariate) and the conditional effi-
ciency scores that derive from this kernel and bandwidth choice showed a great
stability to small changes in the change of k. We compared also with some of
the empirical rules proposed above and the results were often similar although
in some cases (dependence among the Z’s) the latter could be a wrong choice.
An econometric methodology 113
The same remark about the stability of the results with respect to the band-
width choice in this setup was already made by Simar and Wilson (1999c) for
Malmquist indices, and by Daraio and Simar (2005a,b) for the conditional and
convex efficiency measures, compared-for the univariate bandwidth case-with
the Sheather and Jones (1991) method.
θ̂n (Xi , Yi | Zi )
where Qzi = , i is the usual error term with E(i |Zi ) = 0, and
θ̂n (Xi , Yi )
g is the mean regression function, since E(Qzi | Zi ) = g(Zi ).
In this exploratory phase we choose the simple smoothed nonparametric
regression estimator introduced by Nadaraya (1964) and Watson (1964). The
25 We can do the same with the differences θ̂ (x, y | z)−θ̂ (x, y), but since efficiency scores are proportions,
n n
ratios seem very natural.
114 Conditional measures of efficiency
n z−Zi z
i=1 K( h )Qi
g(z) = n z−Zi
. (5.30)
i=1 K( h )
n
wi
g(z) = ( n )Qzi , (5.31)
i=1 j=1 wj
where wi = K( z−Z i z
h ), and hence, it is a linear combination of the Qi . There-
fore, the mean smoothed regression function is a weighted average of the Qzi ,
where the weights are represented by the kernels.
Accordingly, for the output oriented case, we have the same Nadaraya-
Watson estimator of the regression function g, where:
n (Xi , Yi | Zi )
λ
Qzi = .
n (Xi , Yi )
λ
that–one computed taking into account Z; in this last case, the effect of Z let the
efficiency score going up. Consequently, the ratios Qz = θ̂n (x, y | z)/θ̂n (x, y)
will increase, on average, with Z.
In the second case (favorable Z), the environmental variable plays a role of a
“substitutive” input in the production process, giving the opportunity to “save”
inputs in the activity of production; in this case, Z has a “positive” effect on
the production process. It follows that the conditional efficiency θ̂n (x, y | z)
will be much larger than θ̂n (x, y) for small values of Z (less substitutive inputs)
than for large values of Z. Here again, this is due to the fact that firms with
a small value of Z do not exploit the positive effect of Z, and then, when
we take into account Z, their efficiency score goes up. Therefore, the ratios
Qz = θ̂n (x, y | z)/θ̂n (x, y) will, on average, decrease when Z increases.
Since we know that full-frontier estimates, and the derived estimated effi-
ciency scores, are very sensitive to outliers and extreme values, we do also
the same analysis for the more robust order-m and order-α efficiency scores.
Thus, in the empirical illustrations reported in the following section, we present
also the nonparametric smoothed regression of the ratios Qzm = θ̂m,n (x, y |
z)/θ̂m,n (x, y) on Z and of Qzα = θ̂α,n (x, y | z)/θ̂α,n (x, y) respectively.
Mutatis mutandis, the same could be done in the output oriented case, with
similar conclusions to detect the influence of Z on efficiency. In this case,
the influence of Z goes in the opposite direction: an increasing regression
corresponds to favorable environmental factor and a decreasing regression in-
dicates an unfavorable factor. In an output oriented framework, a favorable
Z means that the environmental variable operates as a sort of “extra” input
freely available: for this reason the environment is “favorable” to the produc-
tion process. Consequently, the value of λ̂n (x, y | z) will be much smaller
(greater efficiency) than λ̂n (x, y) for small values of Z than for large values of
Z. Here again, as for the input oriented case, this is due to the fact that firms
with small values of Z do not take advantage from the favorable environment,
and then, when Z is taking into account their efficiency scores improves, i.e.
the value of λ̂n (x, y | z) is smaller, indicating a greater efficiency. The ratios
Qzi = λ̂n (Xi , Yi | Zi )/λ̂n (Xi , Yi ) will increase with Z, on average.
In the case of unfavorable Z, the environmental variable works as a “compul-
sory” or unavoidable output to be produced to face the negative environmental
condition. Z in a certain sense penalizes the production of the outputs of inter-
est. In this situation, λ̂n (x, y|z) will be much smaller than λ̂n (x, y) for large
values of Z. Here, firms with an high level of Z are “more” negatively in-
fluenced by the environment with respect to firms with a low level of Z: for
that reason their efficiency score taking Z into account is much higher with
respect to their unconditional efficiency. As a result, the regression line of
Qzi = λ̂n (Xi , Yi | Zi )/λ̂n (Xi , Yi ) over Z will be decreasing.
116 Conditional measures of efficiency
Z
z0 * 8A *
* * *
φ (y) * *
z1 8B
* * *
* * * *
*
z2
* 8C
*
X
O P Q R S T U
Qz C
* *8
***
* B
* ** * *
>>1 * *
* 8 *
* * * A
* * * * *** *
1 * * * * * **8* * *
Z
z2 z1 z0
In Figure 5.1 we show a simple example to explain why the smoothed re-
gression line of the ratios Qzi = θ̂n (Xi , Yi | Zi )/θ̂n (Xi , Yi ) on Z is decreasing
when Z is favorable. Hence, Figure 5.1 (bottom panel) explains the decreasing
trend of the smoothed regression of the ratios Qz on Z whereas the top panel
An econometric methodology 117
The efficient frontier FDH (φ(y)) is given by the minimum value of input X
used among the analyzed firms (here y is equal for all firms); it corresponds to
the value OP .
For the firms A, B and C, the conditional and unconditional FDH efficiency
scores, together with their ratios Qz , are the following:
OP OP
θA (x0 , Y |z0 ) = ; θA (x0 , Y ) = ⇒ QzA = 1;
OQ OQ
OR OP
θB (x1 , Y |z1 ) = ; θB (x1 , Y ) = ⇒ QzB > 1;
OS OS
OT OP
θC (x2 , Y |z2 ) = ; θC (x2 , Y ) = ⇒ QzC >> 1.
OU OU
Note that firm A has the highest value of Z (compared with firm B and firm
C). Due to the “substitution effect” between Z and X, in correspondence to
this value of Z = z0 , we have the lowest value of the minimum of X (in this
case OP ). Firm B has a level of Z = z1 lower than z0 but higher than z2 . As
a result, for firm B the minimum value of X taking Z into account is OR, that
is higher than OP but lower than OT (the minimum value of X for the firm C
taking Z into account). As a consequence, we have the corresponding order of
the ratios QzC > QzB > QzA .
Z φ(y)
A A’
z0 8 8
*
z1 * *
8B *
*
* *
z2 8 C * * * * * *
* *
X
0 P Q R S T T’
Qz
* * 8
* * * A = A’
**
* *
* * *
*
* ** >>1
* * * * * *8*
* * * * * * B
1 * ** * * * 8 * *
C
z2 z1 z0 Z
0
All what has been said for the input-oriented case applies mutatis mutan-
dis also for the interpretation of the output-oriented case, where an increasing
nonparametric regression line points to a positive effect of Z on the production
efficiency whilst a decreasing nonparametric regression line points to a neg-
ative impact of Z. By construction, λ(x, y)) scores are ≥ 1 (≤ 1),
y) (θ(x,
λ(x, y|z) ≤ λ(x, y) (θ(x, y|z) ≥ θ(x, y)), and therefore Qzi ≤ 1 (Qzi ≥ 1).
The same kind of reasoning applies again for the order-m and order-α con-
ditional and unconditional efficiency scores. Note that here the ratios Qzm,i
and Qzα,i are not bounded by 1 and λ m (x, y|z) (θm (x, y|z)) is not necessarily
≤λ m (x, y) (≥ θm (x, y)), as well as for Qz ratios.
α,i
Moreover, robust ratios have the advantage of being able to show the impact
of external factors even if some extreme observations may mask it when using
full frontier ratios. This case is illustrated in Figures 5.3 and 5.4 which show in
the top panel the case of a production process consisting in a fridge as we have
seen above, in which there are 3 units (marked in the picture as bigger stars
with dotted square around) consisting in fridges of a new generation, that are
perfectly isolated and then are not influenced by the external temperature. In
this case, we see from Figure 5.3 (bottom panel) that full frontier ratios are not
able to capture the effect of the external factor: the smoothing nonparametric
regression line is straight due to the influence of these extremes. On the contrary,
An econometric methodology 119
Figure 5.4 (bottom panel) shows what the impact of external factor is, because
partial frontiers are not influenced by these fridges of new generation. We will
see in Section 6.4 an illustration of this case on real data.
Figure 5.3. The Fridge case with outliers. Full frontier case
Figure 5.4. The Fridge case with outliers. Partial frontiers case
Note that the externality index defined in equation (5.34), may be estimated us-
ing the Nadaraya-Watson nonparametric estimator defined above (see equation
(5.30)) or another nonparametric estimator.
For the output oriented case, in equation (5.32) we have only to substitute:
CE z (x, y) ≡ λ(x,
y|z); U E(x, y) ≡ λ(x, y). (5.37)
The same is for robust (order-m and order-α) measures, where we have
to substitute in equation (5.32) the relative conditional and unconditional effi-
ciency scores in the selected orientation.
For the interpretation of the proposed indicators, and the evaluation of the
influence of Z at the firm level, a major role is played by the ratio Qz , the ratio
of conditional on unconditional measure of efficiency. If we are in an input or
output orientation, and Qz = 1, this means that conditional and unconditional
efficiency scores are equal (this applies both to full and robust efficiency scores).
This value of the ratio points to a situation in which the external factors do not
affect the performance of the analysed firm. On the contrary, if we are in an
Simulated illustrations 121
input oriented framework and Qz > 1 this means that taking Z into account
lead to an higher efficiency score of the firm.
The externality index EI z (x, y) = E(Q z |Z) represents the expected influ-
ence of Z on the performance of the firm: it depends on the own level of Z.
It should be interpreted taking into account the global effect of Z on the pro-
duction process and what said above for the ratio Qz . If EI z = 1 this means
that the firm considered operates in a situation in which we expect that given
the level of the environment, Qz should be equal to one. When EI z > 1 this
means that the firm works at a level of environment with an expected Qz > 1.
Finally, the Individual Index tells us how the firm performed with respect to
the expected value of its performance; i.e. an II z = 1 means that the firm’s
Qz is exactly equal to E(Q z |Z). If the II z > 1 this means that the effect
of the environment on the efficiency score of the firm under consideration is
higher with respect to its expected value. On the contrary, if the II z < 1 we are
considering a firm for which the environmental externality is lower then what
expected for its level of Z.
Summing up, considering the above indications, consulting the smoothed
nonparametric regression plot of Qz over Z, and taking into account the mini-
mum and maximum level of Z, we are able to interpret the effect of Z at firm
level, on the efficiency score of firm, by decomposing the conditional efficiency
score in its main components: unconditional efficiency, externality index and
individual index.
The same interpretation, given for the input-oriented case, mutatis mutandis,
can be done in the output-oriented framework, recalling that λ(x, y) ≥ 1, and
hence Q ≤ 1.
z
5.5.1 Univariate Z
In this example, we simulate a multi-input (p = 2) and multi-output (q = 2)
production process in which the function describing the efficient frontier is (as
in Park , Simar and Weiner, 2000) the following:
y (2) = 1.0845(x(1) )0.3 (x(2) )0.4 − y (1) (5.38)
where y (j) (x(j) ) denotes the jth component of y (of x) for j = 1, 2. We
(j) (j)
draw Xi independent uniforms on (1, 2) and Ỹi independent uniforms on
(0.2, 5). Then the generated random rays in the output space are characterized
(2) (1)
by the slopes Si = Ỹi /Ỹi . Finally, the generated random points on the
frontier are defined by:
(1) (2)
(1) 1.0845(Xi )0.3 (Xi )0.4
Yi,ef f = (5.39)
Si + 1
(2) (1) (2) (1)
Yi,ef f = 1.0845(Xi )0.3 (Xi )0.4 − Yi,ef f . (5.40)
The efficiencies are generated by exp(−Ui ) where Ui are drawn from an expo-
nential with mean µ = 1/3. Finally, in a standard setup (without environmental
factors), we define Yi = Yi,ef f ∗ exp(−Ui ).
On this data set, we introduce the dependency on an environmental factor
Z, adapting Case 1 of Daraio and Simar (2005a). Z is uniform on (1, 4) and
such that it has a quadratic negative impact on the production process till a Z
value of 2.5 and then a quadratic positive impact (here we consider an output
oriented framework):
(1) (1)
Yi = [1 + (Z − 2.5)2 ] ∗ Yi,ef f ∗ exp(−Ui ) (5.41)
(2 ) (2)
Yi = (1 + |Z − 2.5|) ∗ Yi,ef f ∗ exp(−Ui ). (5.42)
We simulate n = 100 observations according to this scenario.
In the nonparametric estimation, we have chosen a truncated gaussian kernel
for the smoothing; we remark that the results are very stable if other kernels with
compact support are used. Figure 5.5 illustrates the likelihood cross validation
plot for the choice of the number of the Nearest Neighbourhood (NN), that in
this case is 18.
For the choice of the values of m and α, the inspection of Figure 5.6 is
particularly useful as it shows a sensitivity analysis on the percentage of points
outside the partial frontiers in a sensitive way, i.e. after a threshold value of 0.15
(we applied the procedure described in Section 4.4.4, in particular in Equation
(4.30) we use a τ = 0.15). We have chosen, then, m = 50 and α = 0.985
such that the percentages of points outside the partial frontiers be close to
zero. These values are such that both order-α and order-m efficiency scores
Simulated illustrations 123
−1.12
−1.13
−1.14
LIKELIHOOD CV criterion
−1.15
−1.16
−1.17
−1.18
−1.19
−1.20
−1.21
−1.22
10 15 20 25 30 35 40 45 50
VALUE of N (Nearest Neighbour)
Figure 5.5. Simulated example with univariate Z. Likelihood cross validation plot for the
choice of the number of the Nearest Neighbourhood (NN). Here the number of k − NN which
maximizes the likelihood cross validation criterion is 18.
0.2
0.1
0
0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1.02
values of α
Percentage of points out of m−frontiers
percentages
0.3
0.2
0.1
0
0 20 40 60 80 100 120 140 160
values of m
Figure 5.6. Simulated example with univariate Z. Plots of the percentage of points outside
order-m and order-α frontiers.
124 Conditional measures of efficiency
are very close to FDH efficiency scores because in this scenario we do not have
outliers. In practice, the choice of these two “tuning” parameters (m and α)
may also be governed by their economic interpretation.
The results are displayed in Figure 5.7. In all panels (top for the FDH case,
middle for the α frontier case and bottom for the m frontier) as expected, we
see that the ratios (of conditional and unconditional FDH, α and m efficiency
scores) allow to detect the “U − shaped” effect of Z on the production process.
5.5.2 Multivariate Z
Two independent components
In this exercise the multi-input (p = 2) and multi-output (q = 2) data set
is simulated according to the same scenario described in the previous Section
5.5.1, but the Z variable now is drawn from a bivariate normal distribution
0.25 0
with mean µ = [2.5 2.5] and covariance matrix Σ = . The
0 0.25
dependence of the production process from Z is introduced as follows:
(1) (1)
Yi = (1 + |Z1 − 2.5|3 ) ∗ Yi,ef f ∗ (1 + Z2 ) ∗ exp(−Ui )
(2) (2)
Yi = (1 + |Z1 − 2.5|3 ) ∗ Yi,ef f ∗ (1 + Z2 ) exp(−Ui ),
(.)
where Yi,ef f are generated as in Section 5.5.1, and the Ui are drawn from an
exponential with mean µ = 1/2. Again, we simulate n = 100 observations
according to this scenario which defines a U-shaped pattern around 2.5 for Z1
and a linear pattern for Z2 .
In the nonparametric estimation, we have chosen a truncated gaussian kernel
for the smoothing as before, which can be easily generalised for the multivariate
case. Figure 5.8 illustrates the likelihood cross validation plot for the choice
of the number of the Nearest Neighbourhood (N N = 30) for the estimation
of the density of Z, which we use for computing the conditional measures of
efficiency. Figure 5.9 shows the estimation of Z done using our k-NN approach
as well as its contour plot. As it appears from the contour plot, Z1 and Z2 are
independent.
To set the value of m and α to compute the robust partial efficiency scores
for the simulated dataset whose DGP has been described above, we plot in
Figure 5.10 the percentage of points which are outside the partial frontiers after
a threshold value of 0.15. By inspecting Figure 5.10 we choose the value of
m = 35 and the value of α = 0.965 so that we leave outside a percentage of
points close to zero.
Some results are displayed in Figures 5.11 and 5.12. In particular, Figure 5.11
shows the global impact of the external variables Z on the simulated production
process providing, the surface of Qz on Z1 and Z2 .
Simulated illustrations 125
1.1
0.9
0.8
Qz
0.7
0.6
0.5
0.4
1.1
0.9
0.8
Qzα
0.7
0.6
0.5
0.4
1.1
0.9
0.8
Qzm
0.7
0.6
0.5
0.4
− 0.80
− 0.81
− 0.82
LIKELIHOOD CV criterion
− 0.83
− 0.84
− 0.85
− 0.86
− 0.87
− 0.88
− 0.89
10 15 20 25 30 35 40 45 50
VALUE of N (Nearest Neighbour)
Figure 5.8. Simulated example with multivariate Z. Likelihood cross validation plot for the
choice of the number of the Nearest Neighbourhood (NN). Here the number of k − NN which
maximizes the likelihood cross validation criterion is 30.
Density of Z
0.8
0.6
0.4
0.2
0
6
5
4 4
3
2 2
1
0 0
values of Z2 values of Z1
3.5
3
values of Z2
2.5
1.5
0.3
0.2
0.1
0
0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1.02
values of α
Percentage of points out of m−frontiers
0.4
percentages
0.3
0.2
0.1
0
0 20 40 60 80 100 120 140 160
values of m
Figure 5.10. Simulated example with multivariate Z. Plots of the percentage of points outside
order-m and order-α frontiers.
0.92
0.90
0.88
Qz
0.86
0.84
0.82
6
5
4 4
3
2 2
1
Z2 0 0
Z1
0.95
Qz
0.9
0.85
0.8
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Z1
0.9
0.88
Qz
0.86
0.84
0.82
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Z2
Figure 5.12. Simulated example with multivariate Z. Top panel smoothed nonparametric
regression of Qz on Z1 for Z2 ’s quartiles. Bottom panel smoothed nonparametric regression of
Qz on Z2 for Z1 ’s quartiles. The dashed line corresponds to the first quartile, the solid line to
the median and the dashdot line to the third quartile.
3.5
1.5
3
1
values of Z2
2.5
0.5
0 2
4
3 4
3 1.5
2 2
values of Z2 1 1 1.5 2 2.5 3 3.5
values of Z1 1
values of Z1
Figure 5.13. Density of Z and Contour plot of the density of Z. Z1 and Z2 are correlated.
130 Conditional measures of efficiency
1.4
1.2
Qzα
0.8
1 1.5 2 2.5 3 3.5 4
Z1
1.4
1.2
Qzα
0.8
1 1.5 2 2.5 3 3.5 4 4.5
Z2
Figure 5.14. Simulated example with multivariate Z. Z1 and Z2 are correlated. Top panel
smoothed nonparametric regression of Qzα on Z1 for Z2 ’s quartiles. Bottom panel smoothed
nonparametric regression of Qzα on Z2 for Z1 ’s quartiles. Dashed line = first quartile, solid line
= median, and dashdot line = third quartile.
Sensitivity to outliers
To complete the simulated illustration of our econometric methodology, we
introduce 5 outliers in the multivariate Z (independent components) simulation
setting described at the beginning of this section.
These extremes points are introduced at the following values of X: (1.25,1.5),
(1.25, 1.75), (1.5,1.5), (1.75, 1.25) and (1.5, 1.25), the corresponding values for
the slopes in the Y space are (0.25, 0.75, 1, 3, 5). The corresponding values
of Z have been drawn from a bivariate normal with mean µ and covariance Σ
(as above). Finally the outliers in the output direction were projected outside
the true frontier multiplying by a factor of 2.5.
The results are displayed in Figures 5.15 and 5.16. As it clearly appears, the
FDH estimator, in presence of the 5 outliers, fails to detect the correct quadratic
effect of Z1 on the production process (the curvature is missed, see Figure 5.15),
but the order−α estimator is able to reproduce the simulated effect of Z1 (see
Figure 5.16). We obtain a similar result for the order−m case.
Simulated illustrations 131
0.96
0.94
0.92
Qz
0.9
0.88
0.86
6
5
4 4
3
2 2
Z2 1 Z1
0 0
0.95
Qz
0.9
0.85
0.8
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Z1
0.95
Qz
0.9
0.85
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Z2
1.02
0.98
Qzα
0.96
0.94
0.92
6
4 5
4
3
2 2
1
Z2 0 0 Z1
0.98
Qzα
0.96
0.94
0.92
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Z1
1
0.98
Qzα
0.96
0.94
0.92
0.5 1 1.5 2 2.5 3 3.5 4 4.5
Z2
Figure 5.16. Simulated example with multivariate Z and 5 outliers. Surface of Qzα on Z1
and Z2 (top graph). Bottom graph: smoothed nonparametric regression of Qzα on Z1 for Z2 ’s
quartiles (top panel) and on Z2 for Z1 ’s quartiles (bottom panel); dashed line = first quartile,
solid line = median and dashdot line = third quartile.
This illustration confirms that it is always useful to compare the results ob-
tained using full frontier efficiency estimators with those obtained applying
robust partial frontiers measures.
Chapter 6
INSURANCE INDUSTRY:
IN SEARCH FOR ECONOMIES OF SCALE,
SCOPE AND EXPERIENCE IN THE
ITALIAN MOTOR-VEHICLE SECTOR
6.1 Introduction
The analysis of the Italian insurance market is interesting because of the
dramatic changes that have occurred in the business during the past two decades.
With more than 43 millions of circulating vehicles, 754 vehicles per thousand
of inhabitants and 103 vehicles per km of road, 13,842 million of Euro of
direct premiums and 416 million of Euro of losses in 2001 (ANIA -Italian
Association of Insurance companies- data), the automobile liability business
is the most important insurance line in Italy, accounting for about 60.7% of
Non-life insurance business and for almost 24% of total insurance premiums
(2001). Its nature of compulsory insurance, the increasing of its tariffs and
their influence on the inflation rate, together with the growing role played by
biological damage reimbursements and by frauds, have opened a deep and
sometimes harsh debate both in the political and technical environments, on
the measures to be adopted in order to take under control premium levels, to
increase efficiency and to promote competition.
The insurance industry in Italy, as well as in other European countries, has
been traditionally subject to stringent regulation affecting pricing, contractual
provisions, establishment of branches, solvency standards, and numerous ad-
ditional operational details. Competitive intensity was very low, with minimal
price and product competition and stable profit margins (Swiss Re 1996, 2000b).
The implementation of the EU’s Third Generation Directives, beginning on
July 1st, 1994, represented a major step in creating conditions in the EU resem-
bling those in a single deregulated national market. In the last years, numerous
events have taken place that have reduced existing barriers among financial in-
stitutions and countries and increased the competitive pressures in the market.
The following are some of the most relevant factors: a) changes in the na-
136 Economies of scale, scope and experience in the Italian motor-vehicle sector
tional and the EU law systems reducing the barriers among financial institutions
and increasing information transparency between the insurer/distributor and the
customer; b) development of new communication and information technologies
and of new management methods (with the consequent increase of information
transmission between the insurer and the distributors); c) increased financial so-
phistication of customers, who are more interested in financial problems, better
educated, and more demanding; d) internationalization of markets; e) increased
importance of insurance products in families saving portfolios; f) growing in-
adequacy of social security pension systems and the consequent increase in the
demand for life insurance products; and g) birth of new financial intermediaries.
Despite of these changes, an analysis by Swiss Re (2000b) finds that per-
sonal lines insurance markets have remained localised. One reason explaining
the slowness of the emergence of cross-border competition is that the European
Directives did not completely eliminate the ability of host countries to influence
insurance markets. For example, EU member countries can still utilize taxa-
tion to discriminate between domestic companies and those based in other EU
countries (Hess and Trauth 1998). In addition, there are significant differences
in contract law across European nations (Swiss Re 1996), impeding contract
standardisation. Domestic insurers also are likely to have an advantage in their
home markets because of cultural affinities, established brand names and dis-
tribution networks, and buyer perceptions that such firms have higher quality
or financial stability than foreign firms. Finally, foreign insurers may be at a
disadvantage in comparison with domestic insurers in terms of their knowledge
of the underwriting characteristics of buyers, exposing foreign firms to higher
informational asymmetry and adverse selection problems in comparison to do-
mestic firms. It is interesting to see in more details how these recent changes
have affected the Italian insurance market.
The Italian discipline of the insurance sector has seen its fundamental year
in 1912 (Law 04/04/1912, no. 305) when the Istituto Nazionale delle Assi-
curazioni (INA) has been created and the affirmation of the principles of “au-
thorisation of admission” and of “control on tariffs” has been ratified (this law
regulated only the life business). With the transfer, in 1923, of the control of
the insurance sector to the Ministry of Industry, begins a long period in which
insurance companies experiment a kind of subjection in respect to the public
administration. Only from the Seventies we observe a deep process of legisla-
tion, mainly due to the European Community regulations. We had three basic
directives.
First directives regulate and harmonise the discipline of the “freedom of
establishment”: a company with its head office in a country member of the EU
can open branches in other EU countries, to which is given the control on the
activity of the branch, according to the principle of the “host country control”.
Introduction 137
Second directives, as well as partially modify the first ones, deal with the
“freedom of services”, in particular with reference to industrial and commercial
risks, and to automobile insurance. The disposition of these directives is the
possibility for insurers to operate in other EU countries without trade barriers
and without the obligation to open a head office in loco.
Third directives, as well as partially modify the first and the second ones,
ratify: a) the application of the principle of the “home country control”; b) the
“single EU licence” allowing to operate in whole EU; c) the deregulation in the
control of tariffs.
The Legislator’s intervention incentives the passage from a strong protec-
tionist context to a wider and free market for insurance. In particular, the July,
1st 1994 has been a milestone for the insurance sector. With the coming into
force of the third directives on life and non-life business, in fact, a common
European market for insurance services has been created. The aim of this disci-
pline is to promote competition among insurers and benefit customers in terms
of a widening supply, reduction of tariffs and increase of quality of services.
The motor-vehicle insurance business, in particular, has known the strongest
deregulation process. Starting from July, 1st 1994 the public Authorities could
not control tariffs and insurance policy conditions anymore. The companies
started to be free to fix prices according to customers’ risk attitudes, and intro-
duced the new tariffs system based on the bonus/malus mechanism. In Italy,
seven direct selling companies for telephone and on-line selling were set up
and services started to be improved with the opening of call centres working 24
hours a day.
Today, in Italy the insurance sector is under the control of three Authorities:
a) the Consob (National Commission for the stock exchange market), on the
subject of transparency on the company’s information (Law 58/98); b) the
ISVAP (Italian Control and Vigilance Authority of the insurance sector), on the
subject of stability and of transparency on premiums and tariffs (Law 576/82);
c) the Antitrust Authority, on the subject of competition (Law 287/90).
In the last years the debate between insurers and Customers Associations has
been very harsh.
On the one side, companies maintain that the Automobile market is very
competitive, as the high differentiation in the tariffs demonstrates. Moreover,
the newly-born companies dealing with telephone and on-line selling - having
lower operating costs - have played a vital role in improving market efficiency
by limiting tariffs increase.
On the other side, Customers Associations have strongly argued that after
1994 tariffs have grown, according to the estimation measures, between the
50% and 100%.
138 Economies of scale, scope and experience in the Italian motor-vehicle sector
27 For an analysis of the information exchange and of the Italian Antitrust intervention in the insurance market
production. Moreover, firms offering different product lines also may realize
economies of scope. Nevertheless, the empirical evidence on economies of
scale and scope is contradictory: while some studies found efficiency gains
others show no efficiency gains or efficiency losses.29
It is also interesting to analyse the impact of the age of companies on their
performance, as proxy for their ability in survivor in a growingly competitive
market, and as a proxy for the experience acquired along time.
The main aim of this chapter is to provide new empirical evidence on classic
industrial organisation topics such as economies of scale, scope and of ex-
perience, analysing the Italian motor-vehicle insurance business. Besides we
provide also a bootstrap-based test on returns to scale and a bias-corrected esti-
mation of the efficiency scores of Italian insurers (along with 95% confidence
intervals). Analysing data of 2000, we provide also a test on the comparison of
structural efficiency of ‘fined’ vs. ‘non fined’ companies. Fined companies are
insurers hit in 2000 by the Antitrust measure no. 8546 (recalled above) for anti-
competitive behavior, while the non fined ones are companies not sanctioned
by the Antitrust measure.
The chapter is organised as follows. The next section presents the data
analysed, the inputs and outputs chosen as well as a normalized principal com-
ponents analysis to explore the dataset. After that, a procedure allowing the
aggregation of inputs and outputs is illustrated. The section that follows shows
the results of the bootstrapping exercise for a sensitivity analysis of the effi-
ciency scores and for a test on returns to scale. Then, economies of scale, scope
and experience are analysed and, finally, the main results are summarised in the
concluding section.
29 Useful surveys are Berger and Humphrey (1997); Cummins and Weiss (2001), and Amel, Barnes, Panetta
and Salleo (2002). See also Harker and Zenios (2000) for an overview on the performance of financial
institutions and its linkages with efficiency, innovation and regulation.
140 Economies of scale, scope and experience in the Italian motor-vehicle sector
40
30
20
10
MV.NRG
ONL.NRG
TNL.NRG
0 TL.NRG
TOT.NRG
Figure 6.1. Nominal rate of growth - gross premiums (direct business) by line of business,
years 1982-2001.
MV.RRG
40 ONL.RRG
TNL.RRG
TL.RRG
TOT.RRG
30
20
10
Figure 6.2. Real rate of growth - gross premiums (direct business) by line of business, years
1982-2001.
Data description 141
Variable Definition
NRG Nominal Rate of growth year t
[(premium year t/premium year t-1) *100-100].
It is the increase of nominal premium in the business considered.
RRG Real Rate of growth=NRG-CPI
CPI Consumer Price Index
(percentage variation on previous year:
up to 1995 of Italy, from 1996 of Europe,
the harmonised Consumer Price. Source: Bank of Italy)
Variable Definition
M V N RG Motor vehicles premiums Nominal Rate of growth
M V RRG Motor vehicles premiums Real Rate of growth
ON L N RG Other Non life premiums Nominal Rate of Growth
ON L RRG Other Non life premiums Real Rate of Growth
T N L N RG Total Non Life premiums Nominal Rate of Growth
T N L RRG Total Non Life premiums Real Rate of Growth
T L N RG Total Life premiums Nominal Rate of Growth
T L RRG Total Life premiums Real Rate of Growth
T OT N RG Total premiums Nominal Rate of Growth
T OT RRG Total premiums Real Rate of Growth
1992-1994 period, and that after the deregulation period, both the nominal and
the real rate of growth of premiums have started to increase again (with a stop
in 2000 and 2001, when the Government imposed a price freeze on motor-third
party liability). This is particularly significant because the rate of growth of
premiums of other non-life lines continued to decline also after 1994.
It is interesting to note the difference in the evolution of the real rate of growth
of automobile business in comparison with the real rate of growth of the other
non-life lines (see Figure 6.2). In fact, as most of them are mature lines, we
would expected that they experienced a similar dynamics.
In the analysis which follows we use ANIA (Italian Association of Insurance
Companies) official data. We had access to the balance sheet and income
statements data as well as several features of the companies: date of foundation,
information on the activity of the firm, i.e. if it is a generalist insurer operating
142 Economies of scale, scope and experience in the Italian motor-vehicle sector
30 Thetotal number of companies hit by the Italian Antitrust Authority is 41, hence in our sample we have
more than 90% of the overall fined companies.
Data description 143
Table 6.4. Descriptive statistics on inputs, outputs and external factor considered in the analysis.
Italian motor vehicle insurers (78 obs).
the Italian insurance market (Turchetti and Daraio, 2004; Cummins, Turchetti
and Weiss, 1996).
In order to avoid the curse of dimensionality of nonparametric estimation
(here we have only 78 observations in a space at 3+2 dimensions) we tried to
reduce the dimensional space of the analysis. After an exploratory analysis,
in Section 6.2.3, we provide an illustration of a statistical methodology which
might be useful for the dimension reduction in productivity analysis.
Data description 145
Table 6.5. Correlations matrix. Italian motor vehicle insurers (78 obs).
Table 6.6. Eigenvalues and percentages of variances explained. Italian motor vehicle insurers
(78 obs).
Table 6.7. Correlations of the first two pc’s with the original variables (Factors loadings).
Italian motor vehicle insurers (78 obs).
few linear relationship between the dimension and the management index of
the units. The age of the companies (number of years they are active on the
market) is mainly related to the level of activity and interestingly seems not
related to the management index.
Figure 6.4, left panel, provides a two dimensional picture of the insurance
companies mapped on the two principal components. The interpretation of
this picture is facilitated (highlighted) by looking simultaneously at Figure 6.3
which give the weights of the original variables in the principal components.
Hence, on the left of Figure 6.4, we have the big insurance companies (in
terms of their level of activity), as an example, company n. 3 is Assicurazioni
Generali, company number 1 is RAS, n. 10 is SAI, n.8 is Fondiaria and so on
(these companies are among the biggest and most known insurance companies
Data description 147
Correlations circle
1 zx3
0.8
0.6
0.4
in3
in2
0.2 in1
Factor 2
0
zx2 ou2
−0.2
ou1
−0.4 zx1
−0.6
−0.8
−1
−1 − 0.5 0 0.5 1
Factor 1
Figure 6.3. Projection of the variables for the Italian Motor-vehicle insurers dataset, year 2000.
Factor 2
36
3 14 47 41 71 6876
1 10 2 11 625 58 35 32 65
0 36
76
68
41
71
33
65
72 3829
7767 0 6933
67 72
5366
70
935
17
24
12 32
19
16
7
4666
53
70
40
34
18
73
75
27
30
63
23
56
51
64
57
61
31
50
44
60
59
74
62
21
120
355
43
48
42
52
49
45 26
28 29
38 1950
40
34
18
30
6364
27
515673
75
22
4 8 395 15 69
54 17
12 24 28 7 31235761
59
44
60
74
62
16 48
−0.05 21 43 55
−0.5 9 13 46
54 42 52
26
20 4549
−0.1
−1 22
−0.15
−1.5
−0.2
Figure 6.4. Projection of the individual insurers for the Italian Motor-vehicle dataset, year
2000. Right panel a zoom.
in Italy). On the right, small units are present (the origin is the average point).
On the north of the picture we find companies with an high management index,
on the south, on the contrary, low management index companies. It appears that
the company n. 37 is particularly active along the management dimension as well
as company n. 78 and company n. 47. These companies are, respectively, Direct
Line, DB Assicura and Dialogo Assicurazioni, three relatively new companies
working through call center, which do not have a big activity dimension but put
148 Economies of scale, scope and experience in the Italian motor-vehicle sector
Table 6.8. Input Factor inertia. Italian motor vehicle insurers (78 obs).
%inertia %cumul
0.9429 0.9429
0.0509 0.9937
0.0063 1.0000
Table 6.9. Eigenvectors of the matrix X X. Italian motor vehicle insurers (78 obs).
Table 6.10. Correlations between the input factor (Fin ) and inputs. Italian motor vehicle
insurers (78 obs).
of inertia which is explained by this first factor (see Table 6.8, first column).
When this ratio is high (close to 1), it indicates that most of the information
contained in the original 3-dimensional data matrix X, is well summarized by
the first factor Fin . Correlations between Fin and X1 , ..., X3 indicate also how
well this new one-dimensional variable represents the original ones (see Table
6.10).
In the output space, the same can be done with the 2 (scaled) output variables,
providing one output factor:
Fout = Y b = b1 Y1 + b2 Y2 (6.2)
where Y : (n × 2) is the data matrix of the (scaled) outputs. Here the vector
b ∈ R2+ is the first eigenvector of the matrix Y Y , corresponding to its largest
eigenvalue (see Table 6.12, first column).
150 Economies of scale, scope and experience in the Italian motor-vehicle sector
Table 6.11. Output Factor inertia. Italian motor vehicle insurers (78 obs).
%inertia %cumul
0.9591 0.9591
0.0409 1.0000
Table 6.12. Eigenvectors of the matrix Y Y . Italian motor vehicle insurers (78 obs).
0.5527 -0.8334
0.8334 0.5527
Table 6.13. Correlations between the output factor (Fout ) and outputs. Italian motor vehicle
insurers (78 obs).
Therefore, in both cases, the factors are a sort of ‘average’ of the (scaled)
original variables. The percentage of inertia explained by the first factor is very
high in both cases (0.9429 for the input factor; 0.9591 for the output case, see
Table 6.11): it is certainly appropriate to summarize the information of the full
data matrix by these two one-dimensional factors, without loosing too much
information. The correlation between the factors and the original variables is
also high (above 0.83 for the input case; above 0.90 in the output case, see Table
6.13). This analysis shows that we may describe the production activity of all
these units by only one input factor and one output factor. Nevertheless, to
reach this conclusion in a rigorous way, it may be useful to apply the Simar and
Wilson (2001)’s bootstrap based procedures to test for aggregation possibilities
(restrictions) on inputs and outputs in efficient frontier models. In our case
here we did not perform this test due to the high correlations found between the
original variables and their aggregate factors.
Testing returns to scale and bootstrapping efficiency scores 151
By construction θCRS,n (Xi , Yi ) ≤ θV RS,n (Xi , Yi ) and we will reject the null
hypothesis if the test statistics T is too small. The p-value of the null-hypothesis
is then obtained by computing:
p − value = Prob(T (Xn ) ≤ Tobs |H0 is true), (6.6)
where Tobs is the value of T computed on the original observed sample Xn .
Of course, we cannot compute this probability analytically but we can ap-
proximate this value by using the bootstrap algorithm described in Section 3.4.
We simulate B pseudo-samples Xn∗,b of size n under the null (i.e. using the
CRS estimate of the frontier for generating the pseudo-samples), and for each
bootstrap sample we compute the value of T ∗,b = T (Xn∗,b ). The p-value is then
approximated by the proportion of bootstrap samples with values of T ∗,b less
than the original observed value Tobs :
B
1I(T ∗,b ≤ Tobs )
p − value ≈ . (6.7)
b=1
B
152 Economies of scale, scope and experience in the Italian motor-vehicle sector
In the application here, with one input-factor and one output-factor as defined
above, we obtain for this test (with B = 2000) a p-value of 0.0055 < 0.05,
hence we reject the null hypothesis of CRS.
Before accepting the VRS hypothesis, Simar and Wilson (2002) suggest to
perform the following test where the null-hypothesis is less restrictive than the
CRS: we test the non-increasing returns to scale (NIRS) model against the VRS:
H0 : Ψ∂ is globally N IRS against H1 : Ψ∂ is V RS.
The procedure is similar to the preceding one where CRS has to be replaced by
NIRS and θN IRS,n (Xi , Yi ) is computed as in Equation (2.16)
where the equality
constraint on the multipliers is replaced by the inequality ni=1 γi ≤ 1. The
computations for this second test lead to a p-value of 0.0405 < 0.05. Hence,
we reject H0 (even if we are close to a border line case) and choose to accept
H1 , i.e. the hypothesis of VRS.
We can visualize the efficient DEA-VRS frontier in Figure 6.5 with a zoom
on the core of the cloud of points in Figure 6.6. Several interesting information
can be obtained by inspecting these figures. For instance, we see that companies
n. 3 and n. 1 are estimated as efficient; they are also isolated with no other
companies to be benchmarked against.
DEA–VRS frontier
25
20
Output factor
15
10
10
14 4
5 2 8
5 6
15
12 11
917 25 39
26
35
28
38
137224
9
32
46
50
33
69
63
2016
19
31
67
22
21
27
30
23
18
34
48
45
42
72
55
57
74
53
51
64
49
54
60
40
65
59
75
73
43
61
66
52
58
56
76
71
70
44
68
41
62
36
37
47
0 78
77
0 5 10 15 20 25 30 35
Input factor
Figure 6.5. Output factor versus Input factor and DEA frontier for the Italian Motor-vehicle
dataset, year 2000.
In Table 6.14 (second column) we show the FDH (input oriented) efficiency
score computed using the input factor and the output factor (we notice that
the computation of the FDH efficiency scores with 3 inputs and 2 outputs led
Testing returns to scale and bootstrapping efficiency scores 153
DEA−VRS frontier
0.16
57
0.14 74 53
64
0.12
49
60
0.1 54
Output factor
65
40
59
0.08
75 61
73 43
0.06 58 66
52
0.04 56
76
44 70
0.02 62 37
71
47 36 41
6878
0 77
0 0.05 0.1 0.15 0.2
Input factor
Figure 6.6. Output factor versus Input factor and DEA frontier for the Italian Motor-vehicle
dataset, year 2000: a zoom.
Table 6.14. Bootstrap results: input efficiency scores with VRS model (B=2000) for Italian
motor- vehicle insurers.
6
Boxplots
1
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78
No. Obs.
Figure 6.7. Boxplots of the input Shephard efficiency scores (VRS). B=2000. Italian motor-
vehicle insurance business (78 obs).
Figure 6.7 illustrates the boxplots of the input Shephard efficiency scores
(the inverse of the Farrell efficiency measures) coming out from our bootstrap
exercise: it clearly appears that firms no. 37, 41 and 40 are the most inefficient.
Figure 6.6 shows, in fact, that unit no. 37 uses a productive mix (as represented
by the ray coming from the origin which passes through the unit) that is dom-
inated by the productive mix used by the other firms in the sample. Close to
the mix of this unit is firm no. 41, which is on a ray between firm no. 37 and
firm no. 40. Slightly better is the situation of firms no. 54 (efficiency score
0.4999), 43 (eff. score 0.4298), 66 (0.4184), 76 (0.3733), 71 (0.4667), which
use a technology that combines inputs and outputs more efficiently than the
previous firms. Then, we observe firms no. 57, 59, 65, 52 and 70 whose mix
dominates the previous firms and which in turns is dominated by the technology
Economies of scale 157
of firms no. 49, 60, 61, 58, 62 and finally by the most efficient technology of
units 47, 74 and 75.
The inspection of the last two columns of Table 6.14 reveals that the original
efficiency estimates lie outside the estimated confidence intervals. This is due
to the fact that the original estimates are biased and that the confidence intervals
estimates correct for the bias. Finally we note also that for unit no. 3 and no.1 the
relative estimated confidence intervals do not contain the corresponding bias-
corrected efficiency scores. This is related to the fact that the bias corrections
are made in terms of the original efficiency estimates, while reciprocals are
used to construct the confidence interval estimates, and because the sample
information in this region is very poor.
risk diversification through pooling. These arguments lead to the prediction that
insurance operations are likely to encounter ranges of production characterised
by increasing returns to scale, permitting some insurers to reduce unit costs by
increasing production, at least within certain limits.
In this section we provide information on whether these critical assumptions
are correct and whether consolidation is likely to be beneficial in Italy, by
applying the econometric methodology described in Chapter 5. In particular
we use the conditional robust efficiency scores, where the external factor is
represented by a proxy of the size of the insurer, i.e. Z is the market share,
to shed lights on the impact of size on the performance of the Italian insurers.
This choice seems to be reasonable as market shares gives an approximation
of the volume of the activity carried out by the insurers and hence of their
dimension. We do not use total costs or incurred losses to proxy size as we use
these variables in the construction of inputs and outputs.
Note that the analysis we carry out in this section is quite different than the
analysis of Returns to Scale (RTS) done in the previous section. In general,
RTS are properties of the frontier of the production set, and are calculated as-
suming the convexity of the production set and using a deterministic estimator
(DEA) which suffers from the curse of dimensionality and of the influence of
extremes/outliers. Here we propose to use robust estimators (such as order-m)
which do not assume any convexity for the production set, and are less influ-
enced by extreme points. Moreover, the econometric methodology developed
in Chapter 5 gives us the possibility of measuring the impact of size at the level
of the individual firm as well as globally, offering the opportunity of capturing
also local effects if they are at place.
Interestingly, Figure 6.8 shows the scatterplot and a straight smoothed non-
parametric regression line of the ratios Qz = θ̂n (x, y | z)/θ̂n (x, y) on Z. On
the contrary, Figure 6.9 - top panel - illustrates an increasing nonparametric
regression line of the Qzm = θ̂n,m (x, y | z)/θ̂n,m (x, y) on Z, till around a mar-
ket share of 1. This trend is confirmed by the ratios Qzα reported at the bottom
panel of Figure 6.9. Here we chose a level of m = 35 and α = 0.97 robust
at around 10%, and our data driven nearest neighborhood approach selected a
k − N N = 29.
As extensively explained in Chapter 5, an increasing (decreasing) nonpara-
metric regression line denotes a negative (positive) effect of the external factor
(Z) on the performance of benchmarked firms, in an input oriented framework.
This case corresponds exactly to the situation described in Section 5.4 where
Figure 5.3 showed no effect, while Figure 5.4 showed a negative impact (here
the impact is till Z = 1). In this situation, even if in a inputs-outputs space
there are not heterogeneous (extreme) units, these extremes may appear in a
more complete space in which external factors have a role. This confirms the
Economies of scale 159
0.8
0.75
0.7
0.65
0.6
0 1 2 3 4 5 6
values of Z
Figure 6.8. Economies of scale in the Italian motor-vehicle insurance business. Full frontier
case (78 obs). Z = market share.
0.8
0.7
0.6
0 1 2 3 4 5 6
values of Z
Effect of Z on Order − α frontier
1
0.9
Qzα
0.8
0.7
0.6
0 1 2 3 4 5 6
Figure 6.9. Economies of scale in the Italian motor-vehicle insurance business. Partial Frontiers
case (78 obs) Z = market share.
H0 : E(θ1 ) = E(θ2 )
against
H1 : E(θ1 ) > E(θ2 ),
where E(θ1 ) is the mean efficiency of generalist insurers and E(θ2 ) is the mean
efficiency of specialist insurers. The mean of DEA input efficiency estimates
for the n1 = 22 generalist insurers is of 0.7497 (with a std. dev. of 0.2138),
while the mean of specialist insurers n2 = 56 is of 0.6658 (with a std. dev.
of 0.1786). The full sample size is n = n1 + n2 = 78 and the full sample is
denoted by Xn = Xn1 ∪ Xn2 . The overall mean efficiency is 0.6894 (with a
std. dev of 0.1915).
The test statistic we use is the following:
i , yi )
n−1
1 i|(xi ,yi )∈Xn1 θ(x
T (Xn ) = i , yi )
,
n−1
2 i|(xi ,yi )∈Xn2 θ(x
where θ(x i , yi ) is the input oriented DEA VRS efficiency estimator of the unit
(xi , yi ) computed using the full sample as the reference set. When the null is
true, then by construction T (Xn ) will be “close” to 1. On the contrary, when
the alternative is true, T (Xn ) will be “far” from 1 (i.e. larger than one). The
p-value for this test has the following form:
Economies of scope 161
The value of the test statistic obtained from our sample is the following: Tobs =
1.1261. The p-value cannot be computed analytically but again the bootstrap
algorithm of Chapter 3 allows to approximate this value. We generate B pseudo-
values Xn,b , b = 1, ..., B under the null hypothesis (i.e., considering that the
2 subsamples come from the same DGP, so we resample from the full sample
Xn ). We compute the test statistics T ,b = T (Xn,b ) for each bootstrap sample,
then the p-value is computed as:
B
1I(T ∗,b ≥ Tobs )
p − value ≈ . (6.8)
b=1
B
Table 6.15. Some descriptive statistics on Age and size (Mktshare) of Italian motor-vehicle
insurers.
while not fined specialist insurers -37 units- have an average efficiency of 0.6652
0.7146
(with a st. dev. of 0.1679). Tobs = = 1.0743.
0.6652
The estimated p-value of the null hypothesis (equality of means), with B =
2000 bootstrap replications, is 0.7755.
Hence, the H0 of equality of the mean efficiency of fined and not fined
specialist insurers cannot be rejected; even if fined insurers seem to be more ef-
ficient than the not fined ones, this difference is not significant at any statistically
meaningful level.
Economics of experience 163
31 Thewhole documentation is available on line, in Italian, on the website of the Italian Antitrust Authority
at: http://www.agcm.it/.
164 Economies of scale, scope and experience in the Italian motor-vehicle sector
Figure 6.10. Economies of experience in the Italian motor-vehicle insurance business, year
2000 (78 obs.). Z = Age in years from foundation.
6.7 Conclusions
In this chapter we explored some classical issues in the industrial organization
literature, adding empirical evidence on the presence of economies of scale,
economies of scope and experience on a sample of Italian insurers, active in the
motor-vehicle business, for the year 2000.
The methodology presented in part I of this work offers a rigorous and easy-
to-interpret tool for evaluating the performance of insurers. It may be applied
to monitor the dynamics of the performance of insurers at a regional, national,
European and international level.
We analysed the sensitivity of efficiency scores through bootstrapping and
find out also that the Returns to Scale of the frontier of the production set of
Italian insurers are variable.
We showed that the application of the probabilistic approach to introduce
external-environmental variables is useful to monitor the influence of these
factors on the performance of insurers.
Using our approach we tested, with a certain rigor, if some commonly as-
sumed economic principles on the insurance industry are empirically well-
grounded.
Conclusions 165
7.1 Introduction
The notion of efficiency is highly problematic in the economics of science.
While policy makers and scientists are ready to accept that research activity
should be organised in such a way to avoid inefficiencies and waste of resources,
the exact definition of what accounts for efficiency is far from being accepted.
Several theoretical and methodological problems are still unsolved.
Any notion of efficiency relates a vector of inputs to a vector of outputs. Un-
fortunately, in scientific research all the elements of efficiency- inputs, outputs
and the functional relation between the two- are affected by different kind of
issues:
definitional problems concern the definition of inputs and outputs and the
identification of the unit of analysis;
measurement problems pertain to the methodologies for collecting inputs
and outputs data as well as comparative issues;
specification problems involve endogeneity, assumptions made and dy-
namic relations.
In the evaluation of productivity, the definition of what accounts for inputs or
outputs of scientific research is one of the most crucial point. From a substantive
perspective all factors can be considered both as input and as output. There are
no definitive answers to this problem. They have to be defined case by case, so
that any factor can be considered as input or as output, taking into account the
purpose of the analysis. The methodology we apply in this chapter takes the
definition of inputs and outputs as given.
A related problem is the identification of the unit of analysis of the scientific
research. While it is true that all researchers are members of an institute or
168 Age, scale and concentration effects in a public research system
32 See Luwel (2004) for a discussion and suggestions for a more integrated approach to construct input and
output data.
33 See among others Coelli (1996), Korhonen, Tainio and Wallenius (2001), Thursby and Kemp (2002).
Studies applying DEA to education include Bessent and Bessent (1980); Bessent, Bessent, Kennington and
Reagan (1982); Charnes, Cooper and Rhodes (1978); Färe, Grosskopf and Weber (1989), Grosskopf, Hayes,
Taylor and Weber (1999), Grosskopf and Moutray (2001). Rousseau and Rousseau (1997, 1998) apply DEA
to construct scientometrics indicators and assess research productivity across countries. See also Bonaccorsi
and Daraio (2004) for a selective review.
Introduction 169
34 For a review of the econometric approaches to Science and Technology (S&T) systems see Bonaccorsi
and Daraio (2004).
35 On general bibliometric theory and methodology see Narin (1987), Narin, Olivastro and Stevens (1994),
Okubo (1997), Mullins, Snizek and Oehler (1988) and Moed, Glanzel and Schmoch (2004). The bibliometric
literature has discussed at length the characteristics of count data; see e.g. Garfield and Dorof (1992),
Holbrook (1992a, b), Kostoff (1994). Citation data are examined among others in Schubert, Glanzel and
Braun (1988), and in Schubert and Braun (1993, 1996). The contributions by Rosenberg (1991), May (1993),
Taubes (1993) and King (2004), among others, examine the quality of national scientific production.
170 Age, scale and concentration effects in a public research system
practice the collection of data on all these outputs is extremely difficult, unless
with field surveys on a limited scale. As a result, for large scale investigations,
data on the number of international publications are considered as acceptable.
Any meaningful measure of productivity therefore should be generated by a
model of multi-input multi-output production without a fixed functional specifi-
cation. Despite the problems recalled above, the idea that scientific production
must exhibit some relation between the resources employed and the output
produced is generally accepted. For practical and policy objectives simple
measures of the ratio of output to input are considered an indicator of scien-
tific productivity. As an example, the crude number of paper per researcher,
within relatively homogeneous fields, is considered as an acceptable indicator
of productivity.
Some related and relevant issues in the economics of science, from a policy
making perspective, empirically controversial are:
(a) the existence of economies of scale in scientific production, i.e. the pos-
itive effect of the concentration of resources over large (institutions or)
institutes on scientific productivity;
(c) the exploration of the relation existing between age structure of researchers
and scientific productivity. If the effect of age on individual productivity
has been largely treated in literature (see for all Levin and Stephan, 1991),
there is few evidence on the effect of age structure at the level of research
institute.
Hence, this chapter aims at discussing theoretically size, age and concentra-
tion effects in science. It also provides empirical evidence analysing the case
of the Italian National Research Council (CNR). Founded in 1923, the CNR
(Consiglio Nazionale delle Ricerche) is the most important national research
institution in Italy, spanning many scientific and technological areas.36
The chapter is structured as follows. In the following of this section we dis-
cuss size, agglomeration and age effects in science. In Section 7.2 we illustrate
the data used in the application. The following sections report the results of
the applications of the econometric methodology described in Chapter 5 to the
36 Studies on the Italian CNR include Bonaccorsi and Daraio (2003a, b, 2005). On the efficiency of the
Italian university system see Bonaccorsi, Daraio and Simar (2006).
Introduction 171
institutes of the CNR, and provide empirical evidence on these controversial is-
sues. The chapter ends by deriving some policy implications from the described
empirical evidence.
the manufacturing sector (Scherer 1980; Milgrom and Roberts 1992; Martin
2002). However, the feasibility of these conditions for scientific research is not
guaranteed for several reasons.
In science, the knowledge stored in publications allows division of cognitive
labour to take place in different places and periods of time. Publication is one
of the most important mechanism for promoting division of cognitive labour.
This means that placing scientists within the same organisational boundaries
is neither a necessary nor a sufficient condition for benefiting from improved
division of labour. There may be a form of division of labour that requires
the establishment of formal collaboration and coordination of tasks between
scientists. Moreover, it is useful to distinguish among division of labour among
peers, and of scientists at various stages of careers, and of scientists and tech-
nicians or assistants. The former type takes the form of personal links, based
on mutual recognition and professional esteem. Only occasionally one can find
the entire web of personal peer relationships included within the boundaries of
a single organisation. A different type of division of labour takes place when
the pattern of personal relations is based on apprenticeship and scientific lead-
ership and requires long periods of joint work and supervision, normally (but
not necessarily) within the same institution. Because both types of division
of labour require personal in-depth supervision, the size of resulting units is
limited by the ability of research directors to monitor closely the work of their
research students and collaborators and to contribute to their training. In most
scientific fields this amounts to say that the maximum size is quite small, in the
order of units or one or two dozens. Summing up, it is unlikely that division of
labour per se is a source of increasing returns to scale at the level of institutes.
Indivisibility is another condition invoked for sustaining critical mass policy.
In many fields the scientific production requires the combination and coordi-
nation of many scientists from different areas, bringing competencies from
complementary fields. However indivisibility is more important at the level of
team or laboratory than at the level of institute or department. This is because
the minimum size of a team or laboratory may be extremely variable across
specific areas within the same fields. In general, this means that economies
of scale may be important up to a threshold level, then become irrelevant. If
the threshold level is quite small, the practical implication is that even small
institutes may be highly efficient, provided that their teams or labs meet the
minimum requirement.
Access to physical infrastructure is another argument commonly associated
to the call for critical mass and concentration of resources in large institutions.
However, it cannot be invoked as a general argument in favour of large institutes
as the research instrumentation required varies according to the field of research.
On this important issue the evidence is ambiguous and contradictory.
Introduction 173
Agglomeration economies
The notion of scientific districts, clusters, poles of excellence or science areas
has been prominent in national and regional science policy in the last twenty
years. The examples of Silicon Valley and Route 128 (Saxenian, 1996) and the
emergence of technopoles and regional clusters (Castells and Hall, 1994; Cooke
and Morgan, 1998) have catalysed the attention of analysts and policy makers
in all advanced countries. At a regional level the notion of cluster identifies
the co-presence and interaction of diverse subjects such as research and educa-
tional institutions, firms, innovative public administrations, financial services,
technology transfer and other intermediary organisations (Acs, 2000; Scott,
2001). At this level the emphasis is not on clustering of research activities per
se, but on clustering of complementary innovative activities in the same area.
This general notion, however, has also inspired policies of location of research
activities by some large public research institutions. In several countries large
public research institutions have pursued a policy of creating geographical con-
centrations of institutes in the same area. For example in Italy CNR promoted
the creation of Research Areas, large agglomerations of institutes in different
fields within the same physical infrastructure. In France most research institutes
at CNRS and INSERM are located in close areas. Behind these policies there
is the idea that proximity favours scientific productivity, insofar as it maximises
personal interaction, face-to-face communication, on-site demonstrations and
transmission of tacit knowledge, as well as it facilitates identification of com-
plementary competencies, unintentional exchange of ideas, café phenomena,
and other serendipitous effects. The focus of our discussion is therefore the no-
tion that concentrating research activities in the same area may bring benefits to
scientific productivity. Here we do not enter into a discussion on more general
policies for clustering and agglomeration of innovative activities.37
Underlying these policies there are some well grounded economic ideas. As
it sometimes happens, the original idea is an old one, but it was rediscovered and
enlarged more recently. The implicit economic analogy is with the concept of
external economies, or Marshallian agglomeration economies (Pyke, Becattini
and Sengenberger, 1986). Alfred Marshall observed that the concentration of
a large number of manufacturing firms in the same area (industrial district) is
not due to chance, but reflects the presence of local externalities in the form
of availability of specialised suppliers, highly trained workforce, sources of
innovative ideas. Costs of production are therefore lower in an agglomerated
area than outside it. More importantly, firms in a district enjoy a particular
industrial atmosphere and benefit from processes of collective invention. The
large literature on geography and trade (Krugman, 1991) and the geographical
37 For
more on this point and on the recent developments of the Economic Geography, see Clark, Feldman
and Gertler (2000).
174 Age, scale and concentration effects in a public research system
Age effects
The existence of age effects in scientific production is one of the few con-
solidated stylised facts in the economics and sociology of science. The decline
of scientific productivity with age may depend on a variety of factors. On the
one hand, as time goes by the initial differences among scientists in individ-
ual productivity get larger. Most theories of scientific productivity postulate a
stochastic and cumulative mechanism (Simon, 1957) or a Matthew effect (Mer-
ton, 1968), whereby those that gain recognition initially in their careers receive
reward and resources, which will be used to carry out further research. If this
is true, initial differences in individual productivity will tend to be larger over
time. Allison and Stewart (1974) found that the Gini index for publications and
citations of scientists monotonically increases over time in a series of cohorts
Introduction 175
from the date of the PhD, with the exception of biologists. This evidence is
interpreted as strongly supporting the notion of reinforcement or positive feed-
backs38 . Another way of looking at the problem of age is to model productivity
as the outcome of a number of features that interact multiplicatively, rather
than additively. For example a model may assume that several elements or
mental factors play a role (e.g. technical ability, finding important problems,
and persistence). As it happens in any multiplicative model, the distribution
of productivity is more skewed than the distribution of any of its determinants.
As a result, a cohort of scientists starting with a given distribution will end up
with a more dispersed distribution and the variance will increase over time. On
the other hand, it is plausible that scientists work on research not only for the
sake of intrinsic pleasure of scientific puzzle solving, but also in the expecta-
tion of receiving future income. If this investment motivation is correct, it will
inevitably happen, as in any theory of human capital accumulation with finite
horizon, that the level of investment will decrease when scientist approach the
date of retirement. Models of human capital are central in the theory of life cy-
cle of scientists. This life cycle effect was found by Levin and Stephan (1991)
for most scientific areas with the exception of particle physics. The impact
of age at the level of research organisations is less clear, however. Within an
institute, for example, experienced scientists might compensate their individual
decline with a well organised activity of training of junior researchers, so that
productivity at the level of institute is not depressed. Being less creative at
the individual level, they might be still prolific in supporting young researchers
and identifying promising research avenues that they do not pursue personally.
Furthermore, aged scientists may have acquired capabilities in managing and
coordinating research teams and laboratories. More generally, little is known
on the pattern with which people of different age are mixed within research
institutes and the resulting impact on scientific productivity.
These problems are becoming critical in science policy given the alarming
evidence on the increasing average age of researchers in most European coun-
tries. For example, in Italy the proportion of professors and researchers in the
age class 24-44 was 60% in 1984 and only 29% in 2001. Those that entered the
academic system in the age class 24-34 were 19% of the total in 1984 and only
5% in 2001 (Avveduto, 2002). To face the problem of ageing of researchers,
there are suggestions that a massive effort should be made by hiring waves of
new researchers in a concentrated period of time, in order to reduce drastically
the average age. While by definition the problem of ageing worsens over time
in the absence of recruitment of many young researchers, it is not at all clear
what should be the time path of recruitment. This chapter analyses thoroughly
the effects of the age structure of researchers on scientific productivity.
38 For more details on positive feedbacks and research productivity in science, see David (1995).
176 Age, scale and concentration effects in a public research system
Table 7.2 shows the variables in the dataset (all variables refer to CNR in-
stitutes). We follow the definition of variables described in the CNR Report.
Monetary variables are left in million of Italian lira (1 euro= 1936,27 lira).
In order to assess the existence of economies of scale related to agglomeration
effects in the scientific production of CNR institutes, we apply the econometric
methodology described in Chapter 5 of this work, considering the impact of a
bivariate external factor (Z) composed by a proxy of size of the institute and a
proxy of the concentration of the institutes in the same geographic area.
To account for the influence of proximity between research institutes we
constructed the Geographical Agglomeration Index (GAI) as follows. To each
institute we assigned one point for each other CNR institute located in the
same city that is not of the same research aggregation; and two points for each
other CNR institute located in the same city that is also of the same research
aggregation of the institute considered. Then we obtained a GAI that goes from
39 to 1, varying between 39 and 33 for the institutes located in Rome, from 23
to 20 for the institute located in Naples, from 16 to 14 for the institutes located
in Pisa and so on. An institute has a GAI of 1 if it is the only CNR institute in
its own town.
Data description 177
VARIABLE DEFINITION
T P ERS Total number of personnel
RESF U N Total research funds
N RESF U N Research funds obtained from the state
M RESF U N Research funds obtained from the market
T COS Total costs
LABCOS Labour costs
T RES Total number of Researchers
T ECH Number of Technicians
ADM Number of Administrative Staff
ADT ECH Number of Adm. Staff and Technicians
T PUB Total number of publications
P IN T P U B Percent international publications
IN T P U B Number of International Publications
P U B P ERS Publications per capita
IP U P ERS International Publications per capita
P U B RES Publications per researcher
IP U RES International Publications per researcher
P M ARF U N Percent of funds raised from the market
P IN V Percent of Total costs allocated to investment
COP U B Cost per publication
COP U BIN T Cost per international publication
AV IM Average Impact factor
IN ST AG Institute age
(based on an estimate of the date of foundation)
GAI Geographical Agglomeration Index
T RES AG Average age of Researchers (all types)
In order to use this measure in our analysis which requires the use of con-
tinuous variables, we let the GAI become continuous by simply adding a small
number randomly chosen from the continuous uniform distribution on the in-
terval [-0.499,+0.499]. This transformation does not affect the GAI indicator
and is suitable for our procedure.
Ideally, the dimension of a research institute is measured by the space it
has, by the physical infrastructure (e.g. no. of computers), and mostly by
the people that work in it. However, we do not have data on the area and
physical infrastructure of institutes, and we use the number of researchers and
the number of technicians and administrative staff as inputs in our analysis. We
have obtained very similar results using as proxy of size the total number of
178 Age, scale and concentration effects in a public research system
people working in the institutes (T P ERS), using also other proxies such as
total costs (T COS) and labour costs (LABCOS) of institutes.
Variable Description
Input 1 No. of Researchers (T RES)
Input 2 No. of technicians and administrative staff (ADTECH)
Input 3 Normalised Research Funds (N RESF U N )
Output Normalised no. of international publications (N IN T P U B)
Ex. factor1 Geographical Agglomeration Index (GAI)
Ex. factor2 Total no. of Personnel (T P ERS)
39 For more details on normalization methods see Schubert and Braun (1996).
Scale and concentration effects 179
Density of Z
3 10−3
1.5
0.5
0
60
40 60
20 40
20
0 0
−20 −20
values of Z2 values of Z1
40
35
30
values of Z2
25
20
15
10
−5
−5 0 5 10 15 20 25 30 35 40
values of Z1
Figure 7.1. Estimation of the density of Z (Z1 = GAI,Z2 = T P ERS) (top panel) and its
contour-plot (bottom panel) CNR institutes (169 obs).
In order to have a robust measure of the global impact of dimension and
geographical concentration on the performance of CNR institutes we choose a
level of robustness at 10% and we obtain a level of α = 0.985 and a level of
m = 50. The number of nearest neighbors chosen by our datadriven approach
is k − N N = 33. Figure 7.1 illustrates the quality of the estimation of the
density of Z carried out by our bandwidth selection procedure, as well as Z’s
contour plot. The contour plot shows that the two external factors, size (Z2 =
T P ERS) and geographical concentration (Z1 = GAI) are not correlated.
The main results of our investigation are reported in Figures 7.2 to 7.5.
As we can see (Figure 7.2), the ratio of full frontier efficiency estimates Qz , is
Age effects on CNR scientific productivity 181
influenced by some outliers rather than the robust partial efficiency ratios Qzα
and Qzm . In the full frontier case, in fact, it appears an inverse U-shape pattern
determined by the influence of extreme values, that is not confirmed by the
robust partial efficiency estimates. This illustration shows the usefulness of
robust measures of efficiency.
As we have at length explained in Chapter 5 by means of the simulated
examples, in order to detect the global effect of the external factors on the
performance of the analysed units, it is of interest the analysis of the behavior
of the surface of the ratios of conditional and unconditional efficiency measures:
Qz , Qzα and Qzm , on Z, as well as the nonparametric smoothed regression of
the ratios Qz (Qzα and Qzm ) on Z1 at the Z2 ’s quartiles, and viceversa, on Z2 at
the Z1 ’s quartiles. In particular, these plots are also able to shed lights on the
interactions between external factors.
We recall that in an input oriented framework, as is the case here, an increas-
ing nonparametric regression line indicates an unfavorable external factor, a
decreasing nonparametric regression line points to a favorable external factor,
while a straight nonparametric regression line denotes no effect of the external
factor.
Figure 7.2 has shown the usefulness of applying robust (partial) estimators
of efficiency to check if the impact of external factors on full frontier estimators
is influenced by the presence of outliers, as it is the case here. Therefore, the
results reported in Figure 7.3 are not very reliable. Another striking result is
evident from Figures 7.4 and 7.5, top panels. We see that there is no influence
of the geographical concentration on the performance of CNR institutes till a
GAI (Z1 ) of 25. In this region of the plots there is the 82.84% of CNR institutes.
Only 29 institutes out of 169 have a GAI greater than 26, and are in the region
of GAI in which there is a slightly decreasing nonparametric regression line,
meaning that there could be a positive influence of geographical concentration
on their performance. Even more interesting is the inspection of the bottom
panels of Figures 7.4 and 7.5. It appears that there is an increasing trend
of the smoothed nonparametric regression line for each number of personnel
(T P ERS). Interestingly enough, size negatively affects the performance of
all CNR institutes.
In this chapter we confirm the results found in a previous work on Italian and
French research institutes (see Bonaccorsi and Daraio, 2005). As it comes out
from our analysis, scientific productivity seems not favoured by the concentra-
tion of resources into larger institutes geographically agglomerated.
1.7
1.6
1.5
Qz
1.4
1.3
150
100 50
40
30
50 20
Z2 10
0 0
Z1
1.8
1.6
Qzα
1.4
1.2
1
150
100 50
40
30
50 20
Z2 10
0 0
Z1
1.8
1.6
Qzm
1.4
1.2
1
150
100 50
40
30
50 20
Z2 10
0 0
Z1
Figure 7.2. Scale and concentration effects on CNR institutes (169 obs).Surface of Qz on Z1
and Z2 (top panel), surface of Qzα on Z1 and Z2 (middle panel), and surface of Qzm on Z1 and
Z2 (bottom panel). Z1 = GAI, Z2 = T P ERS. α = 0.985 and m = 50.
Age effects on CNR scientific productivity 183
1.8
Qz
1.6
1.4
0 5 10 15 20 25 30 35 40 45
Z1
1.6
1.5
Qz
1.4
1.3
Figure 7.3. Scale and concentration effects on CNR institutes (169 obs). Smoothed nonpara-
metric regression of Qz on Z1 = GAI for Z2 ’s quartiles(top panel) and on Z2 = T P ERS
for Z1 ’s quartiles (bottom panel); dashed line = first quartile, solid line = median and dashdot
line = third quartile.
1.8
1.6
Qzα
1.4
1.2
1
0 5 10 15 20 25 30 35 40 45
Z1
1.8
1.6
Qzα
1.4
1.2
1
0 20 40 60 80 100 120 140
Z2
Figure 7.4. Scale and concentration effects on CNR institutes (169 obs). Smoothed nonpara-
metric regression of Qzα on Z1 = GAI for Z2 ’s quartiles(top panel) and on Z2 = T P ERS
for Z1 ’s quartiles (bottom panel); dashed line = first quartile, solid line = median and dashdot
line = third quartile.
184 Age, scale and concentration effects in a public research system
1.5
1.4
1.3
Qzm
1.2
1.1
1
0 5 10 15 20 25 30 35 40 45
Z1
1.8
1.6
Qzm
1.4
1.2
1
0 20 40 60 80 100 120 140
Z2
Figure 7.5. Scale and concentration effects on CNR institutes (169 obs). Smoothed nonpara-
metric regression of Qzm on Z1 = GAI for Z2 ’s quartiles(top panel) and on Z2 = T P ERS
for Z1 ’s quartiles (bottom panel); dashed line = first quartile, solid line = median and dashdot
line = third quartile.
40 Here we set a level of robustness at 10% obtaining a value of α = 0.98 and m = 50. The number of
4
Qz
34 36 38 40 42 44 46 48 50 52
values of Z
Effect of Z on Order−α frontier
5
4.5
3.5
3
Qzα
2.5
1.5
0.5
34 36 38 40 42 44 46 48 50 52
values of Z
Effect of Z on Order−m frontier
5
4.5
3.5
3
Qzm
2.5
1.5
0.5
34 36 38 40 42 44 46 48 50 52
values of Z
Figure 7.6. Age effects on CNR institutes. FDH (top panel), order-α (middle panel) and
order-m (bottom panel) estimates (169 obs). Z = T RES AG.
186 Age, scale and concentration effects in a public research system
institute with higher average age of researchers might have a lower proportion
of younger scientists, that they wouldn’t or couldn’t attract. The average age
of an institute reflects its attractiveness and scientific vitality. In fact, the aver-
age age of existing personnel is lowered each time a young researcher enters
the institute. The higher the scientific prestige of the institute, the resources
available for job positions and the prospects for career, the higher the number
of young candidates wishing to enter. The average age may be considered a
summary statistics for turnover and attractiveness. The policy implication of
this finding is then straightforward: to increase their scientific productivity,
research institutes have to attract young and talented researchers.
β iq = 1, 1 constraint
Γ12 iq = 0, p constraints
Γ22 iq = 0, q constraints
Γ11 Γ12
where Γ = .
Γ21 Γ22
In the previous section our aim was to analyse the impact of scale and ag-
glomeration, and then of age, on how resources are allocated to CNR institutes
(on how their inputs are managed). Therefore, we adopted an input oriented
framework. Now we consider the resources allocated to the research institutes
as given (as they are not very high compared to other European countries) and we
focus on how much the CNR, as a whole, could improve its efficiency in the pro-
duction of international publications per 100 researchers (IN T P U B100res)
and increasing the services done (including external contracts and other ser-
vices approximated by the share of funds coming from external sources -
P M ARF U N ) given the level of cost per 100 employees (COST 100emp)
and the research funds per 100 researchers (RESF U N 100res) owned by its
research institutes. Hence, we adopt an output oriented framework.
The inputs and outputs used in this section are described in Table 7.6. Due
to the heterogeneity in the structure of costs and in the publication practices
across scientific disciplines, we have normalized all variables dividing by the
mean of the scientific areas.
Table 7.6: Definition of inputs and outputs used for the translog
estimation of CNR output distance function.
VARIABLE DEFINITION
Inputs
COST 100emp Labour cost per 100 employees
(thousands of Euros)
RESF U N 100res Research funds per 100 researchers
(thousands of Euros)
188 Age, scale and concentration effects in a public research system
Table 7.7. Output distance function parameter estimates for CNR research institutes.
Outputs
IN T P U B100res Number of international publications
per 100 researchers
P M ARF U N Percentage of funds
raised from the market
As usually done in empirical works (see e.g. Coelli and Perelman 1996,
1999, 2000; Perelman and Santin, 2005; see also Färe and Primont, 1996), we
have mean-corrected all variables prior to estimation, i.e., each output and input
variable has been divided by its geometric mean. By doing that, the first order
coefficients may be interpreted as distance elasticities evaluated at the sample
mean.
The empirical results for the estimated model are presented in Table 7.7 in
which the parameters are estimated fitting a translog function on the FDH fron-
tier (third column), α = 0.975 and m = 35 frontiers (respectively fourth and
fifth columns), the latter robust at around 10%, and finally the COLS standard
approach.
Robust parametric approximation of multioutput\\ distance function 189
Table 7.8. Bootstrapped confidence intervals for order-α frontier translog approximation.
α = 0.975.
Tables 7.8 and 7.9 display the results of our bootstrap exercise, done follow-
ing the procedure described in Florens and Simar (2005), to build confidence
intervals on the estimated parameters (see Chapter 4). Here we set B = 1000
bootstrap loops. The covariance matrices used to estimate the normal approxi-
mation confidence interval are not reproduced here to save space.
The Scale Elasticity (SE), evaluated at the sample mean, is given by:
p
SE = − ∂ ln δo (x, y) /∂ ln xk .
k=1
It is the negative of the sum of the input elasticities. Therefore, increasing
(decreasing) scale economies are indicated by a value of SE greater (less) than
one. The scale elasticity, at the approximation point, is equal to the following
values, according to the estimator used in the first stage.
Table 7.9. Bootstrapped confidence intervals for order-m frontier translog approximation.
m = 35.
As we explained in Sections 4.6 and 4.7 the COLS estimator (here we obtain
a SE of 0.1746) is really a bad choice; the FDH estimator for the first stage
may be influenced by outliers, hence it is always useful to compare the previous
results with those of partial robust estimators (order-m or order-α).
Conclusions 191
We observe that all results are of the same order. These results show that in
all cases the SE is less than one, indicating the presence of decreasing returns
to scale at the mean of the Italian CNR institutes. Hence, it appears that the
result of decreasing returns to scale for the CNR as a whole seems quite stable
for robust and full frontiers estimates.
The inspection of Tables 7.8 and 7.9 show other interesting evidence:
7.6 Conclusions
In the following we report the main findings of the analysis carried out on
the Italian CNR research institutes.
A striking result of our analysis is that size negatively affects the performance
of all CNR institutes. We graphically illustrated that the majority of Italian
research institutes operates with decreasing returns to scale.
41 For a comparative detailed review of the elasticities of complementarity and substitution see Cantarelli
(2005). On elasticities of substitution and complementarity see also Bertoletti (2005).
42 See Grosskopf, Margaritis and Valdmanis (1995) for the computation of marginal rates of transformation
and Morishima elasticities of substitution among the units of a public system, within a distance function
framework.
192 Age, scale and concentration effects in a public research system
We did not find a strong support for agglomeration effects. The argument
that scientific productivity is favoured by the geographical agglomeration of
institutes in the same area did not receive empirical support from our data.
Based on detailed evidence at the micro level on research institutes we
showed that scientific productivity declines with the average age of researchers
of the institute. Nevertheless, the key problem is not the declining individual
productivity, but rather the fact that as time goes on, it becomes increasingly
difficult to create the research climate within scientific institutions that attracts
young and talented scientists. The turnover of scientific personnel must be kept
high on a permanent basis.
The result on decreasing returns to scale was also confirmed by the robust
estimation of the scale elasticity for the average of CNR institutes (around
0.50). By applying the new two step procedure introduced by Florens and
Simar (2005), Daouia, Florens and Simar (2005) and extended in Section 4.7,
we were able to estimate the confidence intervals for the average scale elastic-
ity of CNR institutes both for order-α and order-m robust estimators. Finally,
by doing so, we showed that the parametric approximation of robust and non-
parametric frontiers is also feasible in a multi output framework, by estimating
robust parameters of a multi output Translog distance function which have better
properties than the traditional parametric estimates.
The calculation of elasticities of substitution and of marginal rate of trans-
formation and the like is straightforward and is left to other applications.
Chapter 8
8.1 Introduction
The development of personal finance and the recent movements of retirement
planning have renewed the interest on wealth allocation across asset categories
and detailed investments. Consequently, mutual fund investment companies
have become an increasingly popular way (channel) for capital appreciation
and income generation.
However, the identification of superior performing funds remains a contro-
versial topic due to the volatile nature of individual fund performance and the
methodological problems that compromise the findings of empirical studies.
There is a growing literature on the evaluation of the performance of mutual
funds which deals both on its methodological aspects and on its empirical facets
(features).43
Recently, several works have applied efficiency and productivity techniques
for evaluating the performance of mutual funds. Studies which apply the para-
metric approach include, for instance: Annaert, van den Broeck and Vennet
(1999), Briec and Lesourd (2000). Among the applications of the nonpara-
metric efficiency analysis approach there are: Murthi, Choi and Desai (1997);
Morey and Morey (1999); Sengupta (2000); Basso and Funari (2001); Wilkens
and Zhu (2001); Daraio and Simar (2004). Indeed the estimation of efficient
boundaries arises in portfolio management, as well as in the production frame-
work. In fact, in Capital Assets Pricing Models (CAPM, Markowitz, 1952,
1959) the goal is to study the performance of investment portfolios. Risk
(volatility or variance) and average return on a portfolio act like inputs and
43 Several
surveys are available. See e.g. Shukla and Trzcinca (1992), Ippolito (1993) Grinblatt and Titman
(1995), Cesari and Panetta (2002).
194 Exploring the effects of manager tenure, fund age and their interaction
economies of scale and the influence of market risks on these funds and com-
pared with other US mutual funds categories by objective.
In this chapter we focus our empirical investigation on the impact of some new
management variables on the performance of mutual funds, namely manager
tenure and fund age (age from inception date) and their interactions on fund
performance.
In particular manager tenure is a measure of the manager’s survivorship at
the job (Golec, 1996). Long tenure implies that the management company
finds the manager’s ability and performance satisfactory but may also indicate
that the manager has few better opportunities because of specialized skills or
a modest performance record. Age of fund provides a measure of the fund’s
longevity or ability to survive in a highly competitive environment. It is simply
the number of months that a funds has been in operation.
According to human capital theory, managers with greater human capital
(intelligence and so on) should lead to better performance and hence should be
paid with an higher compensation. Moreover, performance, risks and fees of
mutual funds are all interrelated; consistent with several agency models (see
Golec, 1996 and the references cited there) a manager portfolio risk choices
will partly depend upon his/her risk taking preferences because the volatility of
a manager’s pay is influenced by the portfolio’s performance.
There is a rich literature on the relation between fund manager tenure and
mutual fund performance.
For years, economists have debated whether it is possible for mutual fund
managers to “beat the market”, either through superior stock selection abilities,
or by correctly predicting the timing of overall market advances and declines.
Chevalier and Ellison (1999a) examine whether mutual fund performance is
related to characteristics of fund managers that may indicate ability, knowl-
edge, or effort. During the time period they study, there is a strong correlation
between fund returns and a manager’s age, the average SAT score of his or
her undergraduate school, and whether he or she holds an MBA. For an inter-
esting analysis of the labour market for mutual fund managers and managers’
responses to the implicit incentives created by their career concerns see Cheva-
lier and Ellison(1999b) which find also that managerial turnover is sensitive to
a fund’s recent performance.
Nevertheless, the evidence is inconclusive regarding manager tenure and
performance.
On the one hand, in their study, Lemak and Satish (1996) found that longer-
term managers have a tendency to outperform shorter-term fund managers and
that longer-term fund managers assemble less risky portfolios. Golec (1996)
illustrated that manager tenure is the most significant predictor of performance
and found that longer-term managers, with more than seven years tenure, have
better risk-adjusted performance. He also showed that performance are posi-
196 Exploring the effects of manager tenure, fund age and their interaction
tively related to higher management fees and turnover ratios. Khorana (1996)
added empirical evidence on the underperformance of fund managers who are
replaced or terminated. Furthermore, he showed also that departing managers
exhibit higher portfolio turnover rates and higher expenses relative to non-
replaced managers.
On the other hand, Porter and Trifts (1998) found that managers with ten-year
track records do not perform better than those with shorter track records. On
the same line, Detzel and Weigand (1998) and Fortin, Michelson and Wagner
(1999) showed that manager tenure is not related to performance.
Summing up, whether mutual fund managers produce greater returns is con-
troversial because most studies’ funds, sample periods, and the methodological
assumptions of the adopted performance measures are not comparable.
We add new empirical evidence on this very interesting and contentious issue
using our nonparametric and robust approach. In this chapter, we reduce the
extent of fund changes and the survivorship bias which affect empirical studies
focusing on long time periods, providing a cross-sectional analysis. The chapter
unfolds as follows. In the next section we make a description of the data. In
the following sections we report the results of our empirical investigation and
then we conclude.
Risk is the standard deviation of Return, it depicts how widely the returns
varied over a certain period of time. It is computed using the trailing
monthly total returns for 3 years. All of the monthly standard deviations
are then annualized. Standard deviation of return is an absolute measure
of volatility. It offers a probable range within which a fund’s realized
return is likely to deviate from its expected return.
Transaction costs are made by the sum of Expense Ratio, Loads and Turnover
Ratio.
Expense Ratio is the percentage of fund assets paid for operating expenses and
management fees, including 12b-1 fees (the annual charge deducted from
fund assets to pay for distribution and marketing costs), administrative
fees, and all other asset-based costs incurred by the fund except brokerage
costs. Sales charges are not included in the expense ratio.
Loads have been obtained by summing the Front-End Load and the Deferred
Load of each fund. The Front-End Load is the initial sales charge which
consists in a one-time deduction from an investment made into the fund.
The amount is generally relative to the amount of the investment, so that
larger investments incur smaller rates of charge. The sales charge serves
as a commission for the broker who sold the fund. The Deferred Loads
are also known as back-end sales charges and are imposed when investors
redeem shares. The percentage charged, generally declines the longer
shares are held.
Turnover ratio is a measure of the fund’s trading activity which is computed by
taking the lesser of purchases or sales and dividing by average monthly
net assets. It gives an indication of trading activity: funds with higher
turnover (implying more trading activity) incur greater brokerage fees for
affecting the trades. It is also an indication of management strategy: a low
turnover figure would indicate a “buy-and-hold strategy”; high turnover
would indicate an investment strategy involving “considerable buying and
selling” of securities.
Market risks reflects the percentage of a fund’s movements that can be ex-
plained by movements in its benchmark index. Morningstar compares
all equity funds to the S&P 500 index and all fixed-income funds to the
Lehman Brothers Aggregate Bond Index. It is calculated on a monthly ba-
sis, based on a least-squares regression of the funds returns on the returns
of the fund’s benchmark index.
Manager tenure is the number of years that the current manager has been the
portfolio manager of the fund. For funds with more managers the average
tenure is reported.
198 Exploring the effects of manager tenure, fund age and their interaction
Fund inception date is the date on which the fund began his operations. We
use this information to compute the age in months of mutual funds at April
2002.
Table 8.1. Descriptive statistics on inputs, output, external factors for Aggressive Growth mu-
tual funds (AG117).
The average fund manager tenure is of around 5 years, while its age is about
113 months from inception date. The range of variation are particularly broad:
manager tenure goes from half a year to 30 years, age of fund from 29 to 544
months.44
The variables presented in Table 8.1 are highly skewed; this is also confirmed
by the scatterplot matrices reported in Figures 8.1 and 8.2 where, along the
diagonals of the matrices, are also reported the histograms of all variables.
Figure 8.3 shows the scatterplots of Z1 (manager tenure) and Z2 (fund age)
against the various inputs.
From these plots, it emerges that there are no particular structures or rela-
tionships among the variables, and some extreme points are clearly highlighted.
This evidence calls for the use of robust methods that we will apply in the fol-
lowing of the chapter to avoid the influence of these outlying points on the
efficiency comparisons.
44 We notice that in order to use the methodology described in Chapter 5 we have made more continuous the
last two variables (manager tenure and fund age) by simply adding a small number randomly chosen from
the continuous uniform distribution on the interval [-0.499,+0.499]. This transformation does not affect the
values and is suitable for our procedure which requires the use of continuous variables.
200 Exploring the effects of manager tenure, fund age and their interaction
AG 117
100
X1 50
0
1000
X2
500
0
20
X3
10
0
200
Y
100
0
0 50 100 0 500 1000 0 10 20 0 100 200
X1 X2 X3 Y
Figure 8.1. Scatterplot matrix of inputs (X) and output (Y) for Aggressive Growth Mutual
Funds (AG117).
AG117
150
100
Y
50
0
40
30
Z1
20
10
0
600
400
Z2
200
0
0 50 100 150 0 20 40 0 200 400 600
Y Z1 Z2
Figure 8.2. Scatterplot matrix of output (Y) and external factors (Z) for Aggressive Growth
Mutual Funds (AG117). Z1 is manager tenure and Z2 is Fund age in months.
This result supports the findings of Porter and Trifts (1998), Detzel and
Weigand (1998) and of Fortin, Michelson and Wagner (1999) who find that
longer term manager do not perform better than those with shorter track records.
Nevertheless, as we stated in the introduction, it is difficult to compare results
of evidence obtained using different dataset, coverage and in primis different
methods. Most used techniques in empirical finance are ordinary least squares
(OLS) with some multi-stage OLS (see e.g., Golec (1996) for a discussion).
On the contrary, our flexible approach, robust and nonparametric, offers the
possibility of investigate not only aggregate trends but also single efficiency
Impact of mutual fund manager tenure on performance 201
40
30
Z1
20
10
600
500
400
Z2
300
200
100
0
Figure 8.3. Scatterplot matrix of external factors (Z) and Inputs (X) for Aggressive Growth
Mutual Funds (AG117).
Effect of Z on Order−m frontier
1.5
Qzm
0.5
5 10 15 20 25 30
0.5
5 10 15 20 25 30
values of Z
Figure 8.4. Influence of manager tenure on the performance of Aggressive Growth (AG117)
Mutual Funds.
Effect of Z on Order−m frontier
1.5
1.25
Qzm
0.75
2 4 6 8 10 12
1
0.75
0.5
2 4 6 8 10 12
Figure 8.5. Influence of manager tenure on the performance of Aggressive Growth (AG117)
Mutual Funds. A zoom on Mutual funds with a manager tenure lower than 13 years.
202 Exploring the effects of manager tenure, fund age and their interaction
Qzm < 1
(15 obs) Npoint θ(x, y) (x, y)
α Nemp θm,n (x, y)
Mean 1 0.82 0.98 56 1.13
St.dev. 2 0.22 0.03 31 0.42
Min 0 0.51 0.91 14 0.65
Max 6 1.00 1.00 115 2.46
Qzm > 1
(26 obs) Npoint θ(x, y) (x, y)
α Nemp θm,n (x, y)
Mean 15 0.67 0.80 69 0.76
St.dev. 15 0.13 0.16 32 0.13
Min 1 0.51 0.41 13 0.59
Max 56 0.97 0.99 117 1.04
Qzm = 1
(76 obs) Npoint θ(x, y) (x, y)
α Nemp θm,n (x, y)
Mean 7 0.76 0.89 56 0.85
St.dev. 11 0.17 0.13 34 0.15
Min 0 0.47 0.43 1 0.53
Max 60 1.00 1.00 114 1.16
ALL
(117) Npoint θ(x, y) (x, y)
α Nemp θm,n (x, y)
Mean 8.31 0.75 0.88 59.05 0.87
St.dev. 12.46 0.17 0.14 33.76 0.23
Min 0.00 0.47 0.41 1.00 0.53
Max 60.00 1.00 1.00 117.00 2.46
Npoint is the number of points which dominates the analysed funds, θ(x, y) is
the FDH input efficiency measure. α(x, y) is the input oriented probabilistic
measure introduced in Section 4.3. It is the order−α of the estimated quantile
frontier which passes through the unit (x, y). For instance α(x, y) = 0.98
means that only 2% of funds with a return at least equal to y are using less inputs
than the unit (x, y). Hence, 1 − α(x,
y) gives an estimates of the probability of
unit (x, y) to be dominated, given its level of y. Nemp is the number of units
used to estimate the distribution function and finally θm,n (x, y) is the input
order−m efficiency measure where m = 25 and is robust at 10%.
Table 8.3 presents some conditional measures of efficiency and indicators
for the same groups of AG funds with a different role of manager tenure (given
by the values of Qzm >, <, = 1) as in Table 8.2. ZNpoint is the number of
points which dominates the analysed funds given that Z = z, θ(x, y|z) is the
conditional FDH input efficiency measure, α z (x, y) is the conditional input
oriented probabilistic measure introduced in Section 5.2.3. It is the order−α
of the estimated conditional quantile frontier which passes through the unit
(x, y) given that Z = z. Hence, 1 − α z (x, y) gives an estimates of the prob-
ability of unit (x, y) to be dominated, given its level of y and the condition
Z = z. θm,n (x, y|z) is the conditional input order−25 efficiency measure and
Qzm = θm,n (x, y|z)/θm,n (x, y). EImz = E(Q z |Z = z) is the externality index
z
defined in Chapter 5, IIm = Q /E(Q z z |Z = z) is the individual index and
finally αQz = α z (x, y)/α(x,
y).
To characterize the different groups of funds according to the different impact
of manager tenure, we provide in Table 8.4 some descriptive statistics which
have to be related to the several efficiency measures and indicators presented
above.
Table 8.3. Conditional measures of efficiency and various indicators (Z is manager tenure), by
groups of Aggressive Growth Mutual Funds (AG117).
Qzm < 1
(15 obs) ZNpoint θ(x, y|z) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 0.93 0.82 0.97 1.03 0.91 1.00 0.92 0.99
St.dev. 1.48 0.22 0.05 0.37 0.03 0.03 0.04 0.02
Min 0.00 0.51 0.85 0.61 0.84 0.92 0.85 0.94
Max 5.00 1.00 1.00 2.25 0.95 1.04 1.00 1.00
Qzm > 1
(26 obs) ZNpoint θ(x, y|z) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 8.12 0.74 0.85 0.85 1.11 1.03 1.08 1.07
St.dev. 9.49 0.17 0.16 0.13 0.07 0.02 0.07 0.05
Min 0.00 0.51 0.46 0.63 1.05 1.00 1.00 0.98
Max 32.00 1.00 1.00 1.11 1.33 1.11 1.30 1.17
Qzm = 1
(76 obs) ZNpoint θ(x, y|z) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 4.91 0.78 0.88 0.85 1.00 1.01 0.99 0.99
St.dev. 6.89 0.17 0.13 0.15 0.02 0.02 0.03 0.04
Min 0.00 0.47 0.42 0.50 0.95 0.95 0.92 0.86
Max 37.00 1.00 1.00 1.11 1.05 1.05 1.05 1.14
ALL
(117) ZNpoint θ(x, y|z) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 5.11 0.77 0.88 0.87 1.0 1.01 1.00 1.01
St.dev. 7.44 0.18 0.13 0.20 0.07 0.03 0.07 0.05
Min 0.00 0.47 0.42 0.50 0.84 0.92 0.85 0.86
Max 37.00 1.00 1.00 2.25 1.33 1.11 1.30 1.17
Quite an opposite profile is those of the AG funds with Qzm > 1, i.e. funds
which in turn have an efficiency conditioned by manager tenure higher than
the unconditional one (see the efficiency measures reported above, from the 6th
line from the top to the 10th of Tables 8.2 and 8.3). These funds have a manager
tenure of 6 years, market risk, fund age and size are quite the same than the
average of the whole sample, but they show a lower average return and higher
expense ratio and turnover ratio (182 against 153).
Impact of mutual fund manager tenure on performance 205
Table 8.4. Some descriptive statistics by group of funds with different impact of manager
tenure. Z is manager tenure.
Qzm < 1
(15 obs) Risk Turnover Expense T Return Mkt Manager Fund Size
risks Tenure age
Mean 31.87 95.07 1.48 82.26 56.00 6.53 103.53 1014.48
St.dev. 9.82 71.03 0.75 8.28 18.25 6.71 65.21 2258.49
Min 14.73 15.00 0.48 62.24 20.00 2.00 46.00 2.30
Max 49.71 290.00 2.62 93.08 91.00 30.00 318.00 8828.10
Qzm > 1
(26 obs) Risk Turnover Expense T Return Mkt Manager Fund Size
risks Tenure age
Mean 36.98 182.31 2.18 77.51 47.58 5.88 102.19 571.74
St.dev. 10.94 136.48 2.53 11.35 13.67 2.85 95.42 1292.05
Min 21.07 50.00 0.92 40.12 6.00 1.00 41.00 0.20
Max 81.05 642.00 14.70 93.42 70.00 16.00 544.00 5324.00
Qzm = 1
(76 obs) Risk Turnover Expense T Return Mkt Manager Fund Size
risks Tenure age
Mean 34.41 155.14 1.62 83.13 45.64 4.34 118.75 325.93
St.dev. 7.45 85.89 0.56 9.48 15.85 2.94 110.42 771.33
Min 17.69 44.00 0.90 64.30 8.00 1.00 29.00 0.20
Max 50.22 482.00 3.80 103.76 100.00 21.00 541.00 4914.30
ALL
(117 obs) Risk Turnover Expense T Return Mkt Manager Fund Size
risks Tenure age
Mean 34.66 153.48 1.73 81.77 47.40 4.97 113.12 468.83
St.dev. 8.79 101.00 1.33 10.06 16.09 3.73 102.70 1210.45
Min 14.73 15.00 0.48 40.12 6.00 1.00 29.00 0.20
Max 81.05 642.00 14.70 103.76 100.00 30.00 544.00 8828.10
206 Exploring the effects of manager tenure, fund age and their interaction
Table 8.5. Rank of AG mutual funds with Qzm > 1 ordered by decreasing INCR =
θm,n (x, y|z) − θm,n (x, y). Z is manager tenure.
Table 8.6. Rank of AG mutual funds with Qzm < 1 ordered by decreasing DECR =
θm,n (x, y|z) − θm,n (x, y). Z is manager tenure.
In order to find out the funds most positively influenced by manager tenure,
we report in Table 8.5 the rank of US AG funds, with Qzm > 1, ordered by
decreasing INCR -difference between conditional and unconditional order−m
efficiency measure (= θm,n (x, y|z) − θm,n (x, y)) as well as their individual
efficiency measures. On the contrary, Table 8.6 reveals the name of AG funds
most negatively influenced by manager tenure (with Qzm < 1), ordered by
decreasing DECR (= θm,n (x, y|z) − θm,n (x, y)) as well as their individual
efficiency measures.
Hence, we were able to find out the US AG funds that have most been
influenced by manager tenure, and analyse their different profiles. Of course,
this analysis is not conclusive, because other information would have been
useful to complete our understanding of the manager tenure effect, such as
age of managers, if they have an MBA, and so on, all information that are not
available to us. Nevertheless, this analysis is quite interesting and informative:
even if globally speaking, manager tenure does not affect the performance of
the analysed funds (see Figures 8.4 and 8.5) our procedure is able to identify the
208 Exploring the effects of manager tenure, fund age and their interaction
funds which had the major impact (positive or negative) and let us characterize
their profile. This is particularly useful in empirical finance to try to understand
the management behaviour of stars and best performers.
Another interesting phenomena is the impact of fund age on the performance
and its interaction with manager tenure, with which we deal in the next section.
0.5
50 100 150 200 250 300 350 400 450 500
values of Z
Effect of Z on Order−α frontier
1.5
Qzα
0.5
50 100 150 200 250 300 350 400 450 500
Figure 8.6. Influence of fund age on the performance of Aggressive Growth (AG117) Mutual
Funds.
0.5
50 100 150 200 250
values of Z
Effect of Z on Order−α frontier
1.5
Qzα
0.5
50 100 150 200 250
Figure 8.7. Influence of fund age on the performance of Aggressive Growth (AG117) Mutual
Funds. A zoom on Mutual funds with a fund age lower than 250 months.
The global impact of manager tenure (Z1 ) and fund age (Z2 ) on AG funds
is shown in Figures 8.8 and 8.9.
We notice that the correlations between manager tenure and fund age is quite
low (it is 0.26 although significant at 95%- p. val. 0.004 < 0.05); the number
of k − N N provided by our data driven method (see Chapter 5) is of 55. See
Figure 8.10 for the estimation of the density of Z as well as its contour plot.
In particular, Figure 8.8 shows the surface of Qzm on Z1 and Z2 . Figure
8.9 illustrates the smoothed nonparametric regression of Qzm on Z1 (top panel)
for Z2 ’s quartiles; and the smoothed nonparametric regression of Qzm on Z2
(bottom panel) for Z1 ’s quartiles. The dashed line indicates the first quartile,
the solid line the median and the dashdot line the third quartile.
It appears, as we have seen in Figure 8.4, that globally there is no impact of
manager tenure till a tenure of around 20 years (note that we only have 3 funds
with manager tenure higher than 15 years) and this appears for all quartiles of
fund age (top panel of Figure 8.9). By inspecting Figure 8.9 (bottom panel)
it seems that fund age does not affect the performance of AG funds if it is
conjoint with the first quartile of manager tenure (dashed line), while is seems
that for the median of manager tenure (solid line) there is a positive combined
effect with an age higher than 400 months, and for longer manager tenure (third
quartile, dashdot line) there is a positive effect starting even for funds with age
higher than 100 months. Hence, even though there is almost no global effect of
manager tenure, our procedure allows to shed lights on the interaction between
Z1 and Z2 . An interpretation of this plot could be that longer and then more
experienced manager tenure are better able to exploit the ability/experience of
funds in facing highly competitive markets.
210 Exploring the effects of manager tenure, fund age and their interaction
1.1
1.05
Qzm
0.95
0.9
600
400 40
30
200 20
Z2 10
0 0
Z1
Figure 8.8. Influence of manager tenure and fund age on the performance of Aggressive Growth
Mutual Funds (AG117). Surface.
1.1
1.05
Qzm
0.95
0.9
0 5 10 15 20 25 30 35
Z1
1.05
Qzm
0.95
0 100 200 300 400 500 600
Z2
Figure 8.9. Influence of manager tenure (Z1 ) and fund age (Z2 ) on the performance of Ag-
gressive Growth Mutual Funds (AG117). Plots.
But let examine now the peculiar behaviour of single funds. As in the previ-
ous section, Table 8.7 shows some descriptive statistics on efficiency measures
and indicators for group of funds that have a different impact of manager tenure
jointly considered with fund age. Table 8.8 is useful to characterize the profile
of groups of US funds.
Table 8.10 illustrates the rank of US AG funds, with Qzm > 1, ordered by
decreasing INCR -difference between conditional and unconditional order−m
efficiency measure (= θm,n (x, y|z) − θm,n (x, y)) as well as their individual
efficiency measures. Table 8.9 reveals the name of AG funds with Qzm < 1,
ordered by decreasing DECR (= θm,n (x, y|z) − θm,n (x, y)) as well as their
Interaction between manager tenure and fund age 211
Density of Z
3 10−3
2
1.5
0.5
0
40
20 40
30
20
0 10
0
values of Z2 −20 −10 values of Z1
30
25
20
values of Z2
15
10
0
0 5 10 15 20 25 30
values of Z1
Figure 8.10. Influence of manager tenure (Z1 ) and fund age (Z2 ) on the performance of Ag-
gressive Growth Mutual Funds (AG117). Density of Z (top panel) and contour plot of the density
of Z (bottom panel).
individual efficiency measures. Here we remember that Z is bivariate and is
done by manager tenure and fund age. It is also interesting to compare these
results with those of the previous section. In particular, there are some funds
such as no.1,2,7,12,19,20 and 21 in Table 8.10 that were not listed in Table
8.5; for these funds, manager tenure has an effect only if taking into account
conjointly with funds’ age. The same holds true for some funds of Table 8.9,
such as no. 9, 13, 15, 17 and 18 not listed in Table 8.6.
These considerations are just examples on how to use our approach to shed
light on individual pattern of efficiency measures and their explanation. As a
matter of fact, we showed that our methodology is very appealing in providing
empirical evidence not only on global financial performance of mutual funds,
but also on single peculiar profiles.
212 Exploring the effects of manager tenure, fund age and their interaction
Table 8.7. Some descriptive statistics on efficiency measures and indicators by group of funds
with different impact of manager tenure. Multivariate Z.
Qzm < 1
(19 obs) (x, y)
α θm,n (x, y) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 0.96 1.06 0.95 0.95 0.91 1.01 0.90 0.99
St.dev. 0.06 0.40 0.06 0.34 0.04 0.02 0.05 0.02
Min 0.81 0.65 0.82 0.60 0.80 0.93 0.78 0.95
Max 1.00 2.46 1.00 2.16 0.95 1.02 0.98 1.04
Qzm > 1
(28 obs) (x, y)
α θm,n (x, y) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 0.75 0.73 0.81 0.82 1.12 1.02 1.10 1.07
St.dev. 0.17 0.12 0.19 0.13 0.10 0.00 0.10 0.06
Min 0.41 0.59 0.42 0.63 1.05 1.01 1.03 0.99
Max 0.97 0.94 1.00 1.01 1.45 1.03 1.43 1.23
Qzm = 1
(70 obs) (x, y)
α θm,n (x, y) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 0.91 0.87 0.91 0.87 1.00 1.01 0.99 1.00
St.dev. 0.10 0.14 0.10 0.14 0.02 0.00 0.02 0.04
Min 0.49 0.53 0.53 0.54 0.96 1.00 0.94 0.91
Max 1.00 1.10 1.00 1.08 1.05 1.02 1.04 1.13
ALL
(117 obs) (x, y)
α θm,n (x, y) z (x, y)
α θm,n (x, y|z) Qzm z
EIm z
IIm αQz
Mean 0.88 0.87 0.89 0.87 1.01 1.01 1.00 1.01
St.dev. 0.14 0.23 0.13 0.19 0.09 0.01 0.09 0.05
Min 0.41 0.53 0.42 0.54 0.80 0.93 0.78 0.91
Max 1.00 2.46 1.00 2.16 1.45 1.03 1.43 1.23
Interaction between manager tenure and fund age 213
Table 8.8. Some descriptive statistics by group of funds with different impact of manager
tenure. Bivariate Z.
Qzm < 1
(19 obs) Risk Turn. Exp. T Ret. Mkt Manager Fund Size
risks Tenure age
Mean 33.76 119.84 1.47 81.70 51.74 6.47 111.32 960.77
St.dev. 9.57 97.18 0.69 8.51 18.98 6.04 90.49 2179.79
Min 14.73 15.00 0.48 62.24 20.00 2.00 29.00 2.30
Max 49.71 305.00 2.62 93.08 91.00 30.00 395.00 8828.10
Qzm > 1
(28 obs) Risk Turn. Exp. T Ret. Mkt Manager Fund Size
risks Tenure age
Mean 37.95 165.32 2.19 78.69 46.50 5.71 70.96 382.11
St.dev. 9.94 74.90 2.43 10.80 15.95 2.76 27.02 845.08
Min 26.11 64.00 1.10 40.12 17.00 1.00 35.00 0.20
Max 81.05 305.00 14.70 95.34 72.00 16.00 149.00 3593.00
Qzm = 1
(70 obs) Risk Turn. Exp. T Ret. Mkt Manager Fund Size
risks Tenure age
Mean 33.58 157.87 1.61 83.02 46.59 4.26 130.47 370.00
St.dev. 7.68 108.91 0.59 9.87 15.07 3.00 118.75 891.70
Min 17.69 44.00 0.89 51.61 6.00 1.00 41.00 0.20
Max 50.22 642.00 3.80 103.76 100.00 21.00 544.00 5324.00
ALL
(117 obs) Risk Turn. Exp. T Ret. Mkt Manager Fund Size
risks Tenure age
Mean 34.66 153.48 1.73 81.77 47.40 4.97 113.12 468.83
St.dev. 8.79 101.00 1.33 10.06 16.09 3.73 102.70 1210.45
Min 14.73 15.00 0.48 40.12 6.00 1.00 29.00 0.20
Max 81.05 642.00 14.70 103.76 100.00 30.00 544.00 8828.10
214 Exploring the effects of manager tenure, fund age and their interaction
Table 8.9. Rank of AG mutual funds with Qzm < 1 ordered by decreasing DECR =
θm,n (x, y|z) − θm,n (x, y).Bivariate Z: Z1 manager tenure, Z2 fund age.
Table 8.10. Rank of AG mutual funds with Qzm > 1 ordered by decreasing INCR =
θm,n (x, y|z) − θm,n (x, y).Bivariate Z: Z1 manager tenure, Z2 fund age.
8.5 Conclusions
In this chapter we analysed US Aggressive growth mutual funds and the
impact of management variables on their performance, namely manager tenure
and fund age.
Considered as a whole, we find that there is almost no impact of manager
tenure on mutual funds performance. This result supports the findings of Porter
and Trifts (1998), Detzel and Weigand (1998) and of Fortin, Michelson and
Wagner (1999) who find that longer term manager do not perform better than
those with shorter track records. Nevertheless, as we stated in the introduction,
it is difficult to compare results of evidence obtained using different dataset,
coverage and in primis different methods.
We analysed also the impact of fund age on AG mutual funds, and find that
there is no global effect on the performance of the analysed funds. It seems
that the ability to survive in a highly competitive environment, as measured
by the experience accumulated in a longer number of years in operation, does
not affect the performance of the AG mutual funds as a whole. We analysed
here a very peculiar time frame: our data span the terroristic attack of the 11th
September 2001 which contributed to the collapse of financial markets in most
advanced countries.
Finally, we investigated the interaction between managerial experience (man-
ager tenure) and funds’ longevity (measured by fund age), and how this inter-
action affected the performance of AG mutual funds. We found that longer
and then more experienced manager tenure are better able to exploit the abil-
ity/experience of funds in facing highly competitive markets.
Our flexible approach, robust and nonparametric, offers the possibility of
investigate not only aggregate trends but also single efficiency patterns.
By applying the methodology developed in Chapter 5, we were able to find
out the US AG funds that have most been influenced by manager tenure and their
interaction with fund age, and analyse their different profiles. Our approach is,
in fact, able to capture the interaction between the components of the external
factors (in this case manager tenure and fund age) at the level of individual
funds, even in absence of a global impact of these external factors.
Of course, this analysis is not conclusive, because other information would
have been useful to complete our understanding of the manager tenure effect,
such has age of managers, if they have an MBA, and so on, all information
that were not available to us. Nevertheless, this analysis is quite interesting
and informative: even if globally speaking, manager tenure does not affect the
performance of the analysed funds our procedure is able to identify the funds
which had the major impact (positive or negative) and let us characterize their
profile. This is particularly useful in empirical finance to try to understand the
management behaviour of stars and best performers.
Chapter 9
CONCLUSIONS
partial frontiers in revealing the impact of external factors masked, in the full
frontier case, by extreme observations.
In the analysis on the Italian National Research Council institutes we show
how to estimate parametric approximation of robust and nonparametric fron-
tiers in a full multivariate framework, by estimating robust parameters of a
multi output Translog distance function which have better properties than the
traditional parametric estimates.
Finally, the mutual funds application illustrates how the profile of group of
funds could be characterized starting from a global analysis of the comparison
set towards an even more detailed illustration of single extremely good or bad
performers.
These applications clearly demonstrate that the methodological toolbox pre-
sented in Part I of this book is built up by methods that are often complementary
and help all together to shed light on important aspects of the production process.
By using the real data applications, we have shown that it is always useful to
start the analysis by some correlation matrix-plots and some exploratory mul-
tivariate techniques (see Härdle and Simar, 2003). After that, according to the
kind of data and problem to be analysed an appropriate set of measures has to be
selected from the taxonomy we presented in Chapter 2. In a lot of studies, the
choice of robust methods presented in Chapters 4 and 5 may be the better so-
lution. The knowledge of the statistical properties of the estimators, presented
in Chapter 3 is a basic and fundamental step, in order to be aware of the main
problems and limitations of the traditional DEA/FDH approach, and to learn
how to bootstrap in this context to allow for a better inference in this setting.
A step further is the Chapter 4 where an alternative probabilistic formulation
of the activity analysis framework allows us to open the field to a new set of
probabilistic efficiency measures which keep a link with the traditional FDH
estimator (only asymptotically) while offering a wide range of properties useful
under an applied perspective as well as give us the opportunity of parametri-
cally approximate robust multi-output nonparametric frontiers providing better
inference also in this setting. In Chapter 5 we use this probabilistic approach to
introduce external-environmental factors in this general setting. The economet-
ric methodology we detail and extend to the full multivariate case is particularly
useful to shed light on factors behind the patterns and for the characterisation
of the profile of single DMU and groups of DMUs and not only in providing
aggregate or average tendencies.
We hope that the reading of the book has been useful for applied economists
who wanted to make use of these recently introduced techniques to evaluate and
explain the performance of DMUs in their field or research, without the burden
of limitations of traditional methods. Nowadays the implementation of these
recent techniques is facilitated by the availability of free software like FEAR
(see Wilson, 2005a,b,c).
Conclusions 219
At this point, the readers should be aware that the performance evaluation is
a complex task. A better understanding of the methods described in this book
is a necessary step in the performance evaluation. However, a full exploitation
of this toolbox is not possible without having a good knowledge of the field of
application.
References
[1] Acs, Z. (ed.) (2000), Regional innovation, knowledge and global change. London:
Pinter.
[2] Adams, J.D. and Griliches, Z. (2000), “Research Productivity in a System of Univer-
sities”, in Encaoua, D. et al.(eds.), The Economics and Econometrics of Innovation,
105-140, Kluwer Academic Publishers, Netherlands.
[3] Afriat, S.N. (1967), “The Construction of Utility Functions from Expenditure Data”,
International Economic Review, 8, 67-77.
[5] Aigner, D.J. and Chu S.F. (1968), “On Estimating the Industry Production Function”,
American Economic Review, 58, 826-839.
[6] Aigner, D.J., Lovell, C.A.K., and Schmidt P. (1977), “Formulation and Estimation of
Stochastic frontier Production Function Models”, Journal of Econometrics, 6, 21-37.
[7] Alchian, A. (1965) “Some economics of Property Rights”, Il Politico 30:4 (December),
816-829.
[8] Alchian, A., and R. Kessel (1962) “Competition, Monopoly, and the Pursuit of Money”,
in Aspects of Labor Economics. Princeton, NJ: Princeton University Press for National
Bureau of Economic Research.
[9] Allen, R., Athanassopoulos, A., Dyson, R.G. and Thanassoulis, E. (1997), “Weights
restrictions and value judgements in data envelopment analysis: evolution, develop-
ment and future directions”, Annals of Operations Research, 73, 13-34.
[10] Allison, P.D. and Stewart, J.A. (1974), “Productivity differences among scientists:
evidence for accumulative advantage”, American Sociological Review, 39 (4), 596-
606.
[11] Amel, D., Barnes C., Panetta F. and Salleo C. (2002), “Consolidation and efficiency
in the financial sector:a review of the international evidence”, Tema di discussione,
Banca d’Italia n. 464.
222 REFERENCES
[12] Andersen, P. and Petersen, N.C. (1993), “A procedure for ranking efficient units in
data envelopment analysis”, Management Science, 39, 1261-1264.
[13] Angulo-Meza, L. and Pereira Estellita Lins M. (2002), “Review of Methods for In-
creasing Discrimination in Data Envelopment Analysis”, Annals of Operational Re-
search, 116, 225-242.
[14] Annaert, J., van den Broeck, J., and Vennet R.V., (1999), “Determinants of Mutual
Fund Underperformance: A Bayesian Stochastic FrontierApproach”, paper presented
at the 6EWEPA, Copenhagen, Denmark.
[16] Aragon, Y., A. Daouia and C. Thomas-Agnan (2005), Nonparametric frontier estima-
tion: A conditional quantile-based approach, Econometric Theory, 21, 358–389.
[17] Audretsch, D.B. and Feldman, M. (1996) “R&D spillovers and the geography of
innovation and production”, American Economic Review, 86(3), 630-640.
[18] Avveduto, S. (2002), “Human resources in science and technology”, paper presented
to the CISS Moncalieri Workshop, December 11.
[19] Banker, R.D. (1993) “Maximum likelihood, Consistency and Data Envelopment
Analysis: A Statistical Foundation”, Management Science, 39, 10, 1265-1273
[20] Banker, R.D., Chang H., and Cooper W.W. (1996), “Simulation studies of efficiency,
returns to scale, and misspecification with nonlinear functions in DEA”, Annals of
Operations Research, 66, 233-253.
[21] Banker, R.D., Charnes, A., and Cooper W.W. (1984), “Some Models for Estimating
Technical and scale inefficiencies in DEA”, Management Science, 32, 1613-1627.
[22] Banker, R.D. and Maindiratta, A. (1988), “Nonparametric Analysis of Technical and
Allocative Efficiens in Production”, Econometrica, 56, 1315-1332.
[23] Banker, R.D. and R.C. Morey (1986a), “Efficiency analysis for exogeneously fixed
inputs and outputs”, Operations Research, 34(4), 513–521.
[24] Banker, R.D. and R.C. Morey (1986b), “The use of categorical variables in Data
Envelopment Analysis”, Management Science, 32 (12), 1613-1627.
[25] Barnett, V. and Lewis T. (1995). Outliers in Statistical Data, Chichester, Wiley.
[26] Bartelsman, E.J. and Doms M. (2000), “Understanding productivity: lessons from
longitudinal microdata”, Journal of Economic literature, Vol. XXXVIII, pp.569-594.
[27] Barth, J.R., Nolle D.E. , and Rice T.N. (1997), “Commercial Banking Structure, Reg-
ulation, and Performance: An International Comparison”, in Papadimmitriou D.B.
(ed.), Modernizing Financial Systems, New York: Oxford University Press.
[28] Barth, J.R., Dan Brumbaugh, R.Jr., and Wilcox J.A. (2000), “Policy Watch: The Re-
peal of Glass- Steagall and the Advent of Broad Banking”, Journal of Economic
Perspectives 14, 191-204.
REFERENCES 223
[29] Basso, A., S. Funari, (2001), “A Data Envelopment Analysis Approach to Measure
the Mutual Fund Performance”, European Journal of Operational Research, 135 (3),
477-492.
[30] Berger, A. N., Kashyap A.K., and Scalise J. M. (1995), “The Transformation of the
U.S. Banking Industry: What A Long, Strange Trip It’s Been,” Brookings Papers on
Economic Activity, 2, 55-218.
[31] Berger, A.N. and Humphrey, D.B. (1997), “Efficiency of Financial Institutions: Inter-
national Survey and Directions for Future Research”, European Journal of Operational
Research 98, 175-213.
[32] Berger, A.N., Cummins, J.D. and Weiss, M.A. (1997), “The Coexistence of Multiple
Distribution Systems for Financial Services: The Case of Property-Liability Insur-
ance,” Journal of Business 70, 515-546.
[33] Bergson, A. (1961), National Income of the Soviet Union since 1928. Cambridge, MA:
Harvard University Press.
[35] Bessent, A. and Bessent, W.E. (1980). “Determining the comparative efficiency of
schools through DEA”, Educational Administration Quarterly, 16, 57-75.
[36] Bessent A., Bessent W., Kennington J., and Reagan B. (1982), “An Application of
mathematical programming to assess productivity in the Houston independent school
district”, Management Science, 28, 1355-1367.
[37] Bickel, P.J. and Freedman, D.A. (1981), “Some Asymptotic Theory for the Bootstrap”,
Annals of Statistics, 9, 1196-1217.
[38] Bjurek, H. Hjalmarsson, L. and Forsund, F.R. (1990), “Deterministic Parametric and
Nonparametric Estimation of Efficiency in Service Production: A Comparison”, Jour-
nal of Econometrics, 46, 213-228.
[39] Bogetoft, P. (2000), “DEA and Activity Planning under Asymmetric Information”,
Journal of Productivity Analysis, 13 (1), 7-48.
[41] Bonaccorsi A. and Daraio, C. (2003a), “A robust nonparametric approach to the analy-
sis of scientific productivity”, Research Evaluation, 12 (1), 47-69.
[42] Bonaccorsi A. and Daraio, C. (2003b), “Age effects in scientific productivity. The case
of the Italian National Research Council (CNR)”, Scientometrics, 58 (1), 47-88.
[44] Bonaccorsi A. and Daraio, C. (2005), “Exploring size and agglomeration effects on
public research productivity”, Scientometrics, Vol. 63, No. 1, 87-120.
[45] Bonaccorsi, A., Daraio C. and Simar, L. (2006), “Advanced Indicators of Productivity
of Universities. An application of Robust Nonparametric Methods to Italian data”,
Scientometrics, Vol. 66, No. 2, 389-410.
[47] Briec, W., Kerstens, K., and Lesourd J.B., (2004), “Single Period Markowitz Port-
folio Selection, Performance Gauging and Duality: A variation on the Luenberger’s
Shortage Function”, Journal of Optimization Theory and Applications, 120(1), 1-27.
[48] Briec,W., Kerstens K. and Vanden Eeckaut P.,(2004a), “Non-convex Technologies and
Cost Functions: Definitions, Duality and Nonparametric Tests of Convexity”, Journal
of Economics, 81 (2), 155-192.
[50] Briec, W., and Lesourd J.B., (2000), “The Efficiency of Investment Fund Management:
An Applied Stochastic Frontier Model, in: C.L. Dunis (ed.), Advances in Quantitative
Asset Management, Kluwer, Boston, 41-59.
[52] Castells, M. and Hall, P. (1994), Technopoles of the world. The making of the 21st
century industrial complexes. London: Routledge.
[53] Caves, D.W., Christensen, L.R. and Diewert, W.E. (1982), “The Economic Theory of
Index Numbers of the Measurement of Input, Output and Productivity”, Economerica,
50, 1393-1414.
[54] Cazals, C., Florens, J.P. and Simar, L. (2002), “Nonparametric frontier estimation: a
robust approach”, Journal of Econometrics, 106, 1-25.
[55] Cesari, R. Panetta, F. (2002), “The performance of the Italian equity funds”, Journal
of Banking and Finance, 26, 99-126.
[56] Chambers, R.G., Chung Y. and Färe, R. (1996), “Benefit and distance functions”,
Journal of Economic Theory, 70, 407-419.
[57] Chambers, R. G., and Quiggin J.,(2000), Uncertainty, Production, Choice and Agency:
The State-Contingent Approach, Cambridge University Press, New York.
[59] Charnes, A. and Cooper, W.W. (1985), “Preface to Topics in Data Envelopment Analy-
sis”, Annals of Operations Research, 2, 59-94.
REFERENCES 225
[60] Charnes, A. and Cooper, W.W. (1961), Management Models and Industrial Applica-
tions of Linear Programming, Wiley, New York.
[61] Charnes, A., Cooper, W.W., Lewin A.Y and Seiford L.M (1994), (edited by), Data En-
velopment Analysis. Theory, Methodology and Applications, Kluwer Academic Pub-
lishers, Norwell USA.
[62] Charnes, A., Cooper, W.W., and Rhodes, E. (1978), “Measuring the Efficiency of
Decision Making Units”, European Journal of Operational Research, 2, 429-444.
[63] Chavas, J.-P. and Cox, T.L. (1988), “A Nonparametric Analysis of Agricultural Tech-
nology”, American Journal of Agricultural Economics, 70, 303-310.
[64] Chavas, J.-P. and Cox, T.L. (1990), “A Non-Parametric Analysis of Productivity: The
Case of U.S. and Japanese Manufacturing”, American Economic Review, 80, 450-464.
[65] Cherchye, L., Kuosmanen T. P., and Post G.T., (2000) “What is the economic meaning
of FDH? A replay to Thrall”, Journal of Productivity Analysis, 13, 263-267.
[66] Cherchye, L., Kuosmanen, T. and Post, G.T. (2001), “Nonparametric Production
Analysis under Alternative Price Conditions”, CES Discussion Paper 01.05.
[67] Chevalier, J., and Ellison G. (1999a), “Are some Mutual Fund Managers Better Than
Others? Cross-sectional Patterns in Behavior and Performance”, Journal of Finance,
54 (3), 875-898.
[68] Chevalier, J., and Ellison G. (1999b), “Career concerns of Mutual Fund Managers”,
The Quarterly Journal of Economics, 114 (2), 389-432(44).
[69] Christensen L.R., Jorgenson D. W., and Lau L.J., (1973), “Transcendental Logarithmic
Production Frontiers”, Review of Economics and Statistics, 55, 28-45.
[70] Clark, G.L., Feldman M.P. and Gertler M.S., edited by, (2000), The Oxford Handbook
of Economic Geography, Oxford University Press, NY.
[71] Consiglio Nazionale delle Ricerche (1998), CNR Report 1998. CNR Roma.
[72] Coelli, T. (1996), Assessing the Performance of Australian Universities using Data
Envelopment Analysis, mimeo, Centre for Efficiency and Productivity Analysis, Uni-
versity of New England.
[73] Coelli, T. (2000), “On the econometric estimation of the distance function repre-
sentation of a production technology”, CORE Discussion Paper 2000-42, Université
Catholiqué de Louvain, and CEPA, School of Economic Studies, University of New
England.
[74] Coelli, T. and Perelman S., (1996), “Efficiency Measurement, Multiple-output Tech-
nologies and Distance Functions: with application to European Railways”, CREPP
Working Paper no. 96/05, Université de Liege, Belgium.
[75] Coelli, T. and Perelman S., (1999), “A comparison of parametric and nonparamet-
ric distance functions: with application to European railways”, European Journal of
Operational Research, 117, 326-339.
226 REFERENCES
[76] Coelli, T. and Perelman S., (2000), “Technical efficiency of European railways: a
distance function approach”, Applied Economics, 32, 1967-1976.
[77] Coelli, T., Rao, D.S.P. and Battese, G.E. (1998), An Introduction to Efficiency Analysis,
Kluwer Academic Publishers.
[79] Cooke, P. and Morgan K. (1998) The associational economy. Firms, regions and in-
novation. Oxford: Oxford University Press.
[80] Cooper, W.W., Li S., Seiford L.M., Tone K., Thrall R.M., and Zhu J. (2001), “Sensi-
tivity and Stability Analysis in DEA: Some Recent Developments”, Journal of Pro-
ductivity Analysis, 15, 217-246.
[81] Cooper, W.W., Seiford L.M., and Tone K. (2000), Data Envelopment Analysis: A
Comprehensive Text with Models, Applications, References and DEA-Solver Soft-
ware,Kluwer Academic Publishers, Boston.
[82] Cornwell C., Schmidt P., and Sickles R. C. (1990), “Productivity Frontiers with cross
sectional and time series variation in efficiency levels”, Journal of Econometrics, 46,
185-200.
[83] Cummins, J.D., Turchetti, G. and Weiss, M. (1996), “Productivity and Technical Ef-
ficiency in the Italian Insurance Industry,” Working paper, Wharton Financial Institu-
tions Center, University of Pennsylvania, Philadelphia.
[84] Cummins, J.D. and Weiss, M.A. (2001), “Analyzing Firm Performance in the Insurance
Industry Using Frontier Efficiency Methods,” in Georges Dionne (ed.), Handbook of
Insurance Economics, Boston: Kluwer Academic Publishers.
[85] Cummins, J.D., Weiss, M.A., and Zi, H. (1999), “Organizational Form and Efficiency:
An Analysis of Stock and Mutual Property-Liability Insurers”, Management Science
45, 1254-1269.
[86] Daniel, H.D. and Fisch, R. (1990), “Research performance evaluation in the German
university sector”, Scientometrics, Vol. 19 (5-6), 349-361.
[87] Dantzig, G.B. (1963), Linear Programming and its Extensions, Princeton University
Press, Princeton.
[90] Daouia, A., J.P. Florens and L. Simar (2005), Functional Convergence of Quantile-
type Frontiers with Application to Parametric Approximations, Discussion paper 0538,
Institut de Statistique, UCL.
REFERENCES 227
[91] Daraio C. (2003), Comparative Efficiency and Productivity Analysis based on non-
parametric and robust nonparametric methods. Methodology and Applications, Doc-
toral dissertation, Scuola Superiore Sant’Anna, Pisa (Italy).
[92] Daraio C., Simar, L. (2004), “A Robust Nonparametric Approach to Evaluate and Ex-
plain the Performance of Mutual Funds”, Discussion Paper no. 0412, Institut de Sta-
tistique, UCL, Belgium, forthcoming in European Journal of Operational Research.
[94] Daraio C. and Simar, L. (2005b), “Conditional Nonparametric Frontier Models for
Convex and Non Convex Technologies: A unifying Approach”, Discussion Paper no.
0502, Institut de Statistique, UCL, Belgium, forthcoming in Journal of Productivity
Analysis.
[95] David, P.A. (1995), “Positive Feedbacks and Research Productivity in Science: Re-
opening another Black Box”, in Granstrand O. (eds), Economics of Technology, North-
Holland, Amsterdam.
[96] de Alessi, L. (1974) “An economic analysis of government ownership and regulation:
theory and the evidence from the electric power industry”, Public Choice, 19:1, 1-42.
[97] de Alessi, L. (1983) “Property rights, transaction costs, and X-efficiency: an essay in
economic theory”, American Economic Review, 73:1 (March), 64-81.
[99] Debreu, G. (1951), “The Coefficient of Resource Utilization”, Econometrica, 19, 273-
292.
[100] Deckle, R. (1988), “The Japanese ’Big Bang’ financial reforms and marketing impli-
cations”, Journal of Asian Economics, 9, 237-249.
[101] Deprins, D. and Simar, L. (1983), “On Farrel Measures of Technical Efficiency”,
Recherches économiques de Louvain, 49, 123-137.
[102] Deprins, D. and L. Simar (1985), A Note on the Asymptotic Relative Efficiency of the
M.L.E. in a Linear Model with Gamma Disturbances, Journal of Econometrics, 27,
383–386.
[104] Deprins, D., Simar L. and Tulkens H. (1984), “Measuring labor-efficiency in post
offices”, in Marchand, M., Pestieau, P. and Tulkens, H. (eds.) The Performance of
public enterprises - Concepts and Measurement, Amsterdam, North-Holland, 243-
267.
[105] Detzel L.F., R. Weigand (1998), “Explaining Persistence in Mutual Fund Perfor-
mance”, Financial Services Review, 7 (1), 45-55.
228 REFERENCES
[106] Diewert, W.E. and Parkan, C. (1983), “Linear Programming Tests of Regularity Condi-
tions for Production Frontiers”, in Eichhorn, W., Henn, R., Neumann, K. and Shephard,
R.W. (eds.) Quantitative Studies on Production and Prices, Wuerzburg and Vienna,
Physica-Verlag.
[108] Dorfman R., Samuelson P. and Solow R. (1958), Linear Programming and Economic
Analysis, McGraw Hill Text.
[109] Efron, B. (1979), “Bootstrap Methods: another look at the Jackknife”, The Annals of
Statistics, Vol. 7, No.1, 1-26.
[110] Efron, B., and Tibshirani, R.J. (1993), An introduction to the Bootstrap, Chapman and
Hall, NY.
[112] Färe, R. (1975), “Efficiency and the Production Function”, Zeitschrift fuer Nation-
aloekonomie, 35, 317-324.
[113] Färe, R., Grosskopf, S., Lindgren,B. and Roos, P. (1989), “Productivity Developments
in Swedish Hospitals: A Malmquist Output Index Approach”, in Charnes, A, Cooper,
W.W., Lewin, A. and Seiford, L. (eds.), Data Envelopment Analysys: Theory, Method-
ology and Applications, Boston: Kluwer Academic Publishers.
[114] Färe, R., Grosskopf, S. and Lovell, C.A.K. (1985), The Measurement of Efficiency of
Production, Boston: Kluwer-Nijhoff Publishing.
[115] Färe, R., Grosskopf, S. and Lovell, C.A.K. (1992), “Indirect Productivity Measure-
ment”, Journal of Productivity Analysis, 2, 283-298.
[116] Färe, R., Grosskopf, S. and Lovell, C.A.K. (1994), Production Frontiers, Cambridge
University Press, Cambridge.
[117] Färe, R., Grosskopf, S. and Russell, R.R. (1998), Index Numbers: Essays in Honour
of Sten Malmquist, Boston: Kluwer Academic Publishers.
[118] Färe, R., S. Grosskopf, C.A. K. Lovell and C. Pasurka (1989), “Multilateral Productiv-
ity Comparisons when some Outputs are Undesirable: a Nonparametric Approach”,
Review of Economics and Statistics 71 (1), 90-98.
[119] Färe, R., Grosskopf, S. and Weber, W. (1989), “Measuring school district perfor-
mance”, Public Finance Quarterly, 17, 409-428.
[120] Färe, R., Grosskopf, S., (2004), New Directions: Efficiency and Productivity, Kluwer
Academic Publishers.
[121] Färe, R. and Lovell, C.A.K. (1978), “Measuring the Technical Efficiency of Produc-
tion”, Journal of Economic Theory, 19, 150-162.
[122] Färe, R. and Primont, D. (1996), “The opportunity cost of duality”, Journal of Pro-
ductivity Analysis, 7, 213-224.
REFERENCES 229
[123] Färe, R. and V. Zelenyuk (2003), “On Aggregate Farrell Efficiency Scores”, European
Journal of Operations Research 146:3, 615-620.
[124] Farrell, M.J. (1957), “The measurement of the Productive Efficiency”, Journal of the
Royal Statistical Society, Series A, CXX, Part 3, 253-290.
[125] Farrell, M.J. (1959), “Convexity assumption in theory of competitive markets”, Journal
of Political Economy, 67, 377-391.
[126] Farrell, M.J. and Fieldhouse, M. (1962), “Estimating Efficient Production Functions
under Increasing Return to Scale”, Journal of the Royal Statistical Society, Series A,
CXXV, Part 2, 252-267.
[127] Feldman, M.P. (2000), “ Location and Innovation: the New Economic Geography of
Innovation, Spillovers, and Agglomeration”, in Clark, G.L., Feldman M.P. and Gertler
M.S., edited by, , The Oxford Handbook of Economic Geography, Oxford University
Press, NY, 373-394.
[128] Ferrier G.D., and Hirschberg J. G. (1997), “Bootstrapping Confidence Intervals for
Linear Programming Efficiency Scores: with an illustration using Italian Bank Data”,
The Journal of Productivity Analysis, 8, 19-33.
[129] Ferrier G.D., and Hirschberg J. G. (1999), “Can we bootstrap DEA scores?”, The
Journal of Productivity Analysis, 11.
[130] Fischer, I. (1922), The Making of Index Numbers, Boston: Houghton Mifflin.
[132] Forsund, F.R., and N. Sarafoglou (2002), “On the origins of Data Envelopment Analy-
sis”, Journal of Productivity Analysis, 17 (1/2), 23–40.
[133] Fortin R., S. Michelson, and J. Wagner (1999), “Does Mutual Fund Manager Tenure
Matter?” Journal of Financial Planning, August, 12.
[134] Freedman, D.A. (1981), “Bootstrapping regression models”, Annals of Statistics, Vol
9, 6, 1218–1228.
[135] Fried, H.O, Lovell, C.A.K. and Schmidt S.S. (1993), edited by, The measurement
of Productive Efficiency. Techniques and Applications, New York Oxford, Oxford
University Press.
[136] Fried, H.O, Lovell, C.A.K. and Schmidt S.S. (2006), edited by, The Measurement of
Productive Efficiency, 2nd Edition, New York Oxford, Oxford University Press.
[137] Fried, H.O., C.A.K. Lovell, S.S. Schmidt and S. Yaisawarng (2002), “Accounting for
environmental effects and statistical noise in Data Envelopment Analysis”, Journal of
Productivity Analysis, 17 (1/2), 157–174.
[138] Fried, H.O., S.S. Schmidt and S. Yaisawarng (1999), “Incorporating the operating
environment into a nonparametric measure of technical efficiency”, Journal of Pro-
ductivity Analysis, 12, 249–267.
230 REFERENCES
[139] Garfield, E. and Dorof, W.A. (1992), “Citation data: their use as quantitative indicators
for science and technology evaluation and policy making”, Science and Public Policy,
19 (5), 321-327.
[140] Gattoufi, S., M. Oral, and A. Reisman (2004), “Data envelopment analysis literature:
A bibliography update (1951-2001)”, Journal of Socio-Economic Planning Sciences,
Vol.38, Issues 2-3, 159-229.
[141] Gijbels, I., Mammen, E., Park, B.U., and Simar L. (1999), “On Estimation of Monotone
and Concave Frontier Functions”, Journal of the American Statistical Association, 94,
220-228.
[142] Girod, O. and Triantis, K. (1999), “The Evaluation of Productive Efficiency Useing
a Fuzzy Mathematical Programming Approach: The Case of the Newspaper Preprint
Insertion Process”, IEEE Transactions on Engineering Management, 46, 1-15.
[143] Golec, J. (1996), “The Effects of Mutual Fund Managers’ Characteristics on Their
Portfolio Performance, Risk and Fees”, Financial Services Review, 5 (2), 133-148.
[144] Goto, I. (1999), “Japan: The Finalization of the Big Bang,” International Financial
Law Review (July).
[149] Grosskopf S., Margaritis D. and Valdmanis V. (1995), “Estimating output substitutabil-
ity of hospital services: A distance function approach”, European Journal of Opera-
tional Research, 80, 575-587.
[151] Grosskopf, S., Hayes, K., Taylor, L. and Weber, W. (1997), “Budget constrained fron-
tier measures of fiscal equity and efficiency in schooling, Review of Economics and
Statistics, 79, 116-124.
[152] Grosskopf, S. (2003), “Some Remarks on Productivity and its Decompositions”, Jour-
nal of Productivity Analysis, 20, 459-474.
[154] Grosskopf, S., Hayes, K., Taylor L. and Weber W. (1999) “Anticipating the conse-
quences of school reform: a new use of DEA”, Management Science, 45, 608-620.
[155] Hall, P. and Simar, L. (2002), “Estimating a changepoint, boundary or frontier in the
presence of observation error”, Journal of the American Statistical Association, 97,
523-534.
[156] Halme, M., Joro, T., Korhonen, P., Salo, S. and Wallenius, J. (2000), “Value efficiency
analysis for incorporating preference information in DEA”, Management Science, 45,
103-115.
[157] Hanoch, G. and Rotschild, M. (1972), “Testing the Assumptions of Production Theory:
A Nonparametric Approach”, Journal of Political Economy, 80, 256-275.
[158] Hansmann, H. (1988) “Ownership of the firm”, Journal of Law, Economics and Or-
ganization, 4:2 (Fall), 267-304.
[160] Härdle, W. and L. Simar (2003), Applied Multivariate Statistical Analysis, Springer-
Verlag, Berlin.
[161] Harker P.T. and S.A. Zenios (2000), edited by, Performance of Financial institutions.
Efficiency, Innovation, Regulation, Cambridge University Press, UK.
[162] Henderson, D.J. and L. Simar (2005), “A Fully Nonparametric Stochastic Frontier
Model for Panel Data”, Discussion paper 0525, Institut de Statistique, UCL.
[163] Hess, T. and Trauth, T. (1998), “Towards A Single European Insurance Market”,
International Journal of Business 3, 89-100.
[164] Hicks, J.R. (1935), “The theory of Monopoly: A Survey”, Econometrica, 3:1 (January),
1-20.
[165] Hogan, A.M.B. (1995), “Regulation of the Single European Insurance Market”, Jour-
nal of Insurance Regulation, 13, 329-358.
[166] Holbrook J.A.D. (1992a), “Basic indicators of scientific and technological perfor-
mance”, Science and Public Policy, 19 (5), 267-273.
[167] Holbrook J.A.D. (1992b), “Why measure science? ”, Science and Public Policy, 19
(5), 262-266.
[168] Holmstrom, B. R., and J. Tirole (1989), “The theory of the firm”, in R. Schmalensee
and R. D. Willig, eds., Handbook of Industrial Organization, Volume I. Amsterdam:
Elsevier Science Publishers.
[169] Ippolito, R.A. (1993), “On Studies of Mutual Fund Performance, 1962-1991”, Finan-
cial Analysts Journal, 49 (1), 42-50.
[171] Jeong, S.O. , B. U. Park and L. Simar (2006), “Nonparametric conditional efficiency
measures: asymptotic properties”. Discussion paper 0604, Institut de Statistique, UCL.
[172] Jeong, S.O. and L. Simar (2005), “Linearly interpolated FDH efficiency score for
nonconvex frontiers”, Discussion 0501, Institut de Statistique, UCL.
[174] Kao, C. and Liu, S.T. (1999), “Fuzzy Efficiency Measures in Data Envelopment Analy-
sis”, Fuzzy Sets and Systems, forthcoming.
[176] King, D. (2004), “The Scientific impact of Nations”, Nature, 430, 311-316.
[177] Kneip, A. and Simar, L. (1996), “A general framework for frontier estimation with
panel data”, The Journal of Productivity Analysis, 7, 187-212.
[178] Kneip, A., Park B.U. and Simar L. (1998), “A Note on the Convergence of Nonpara-
metric DEA Estimators for Production Efficiency Scores”, Econometric Theory, 14,
783-793.
[179] Kneip, A., Simar, L. and Wilson, P.W. (2003), “Asymptotics for DEA Estimators in
Nonparametric Frontier Models”, Discussion Paper no. 0317, Institut de Statistique,
UCL.
[181] Koopmans, T.C. (1957), Three Essays on the State of Economic Science. New York:
McGraw Hill.
[182] Korhonen, P., Tainio, R. and Wallenius J. (2001), “Value efficiency analysis of acad-
emic research”, European Journal of Operational Research, 130, 121-132.
[184] Korostelev, A., Simar L. and Tsybakov A.B. (1995), “Efficient estimation of monotone
boundaries”, The Annals of Statistics, 23, 476-489.
[185] Krugman P. (1991), “Increasing returns and economic geography”, Journal of Political
Economy, 99(3), 483-499.
[186] Kumbhakar S. C., Lovell C.A.K. (2000), Stochastic Frontier Analysis, Cambridge
University Press, UK.
[187] Kumbhakar, S.C. , Park, B.U., Simar, L. and E.G. Tsionas (2004), “Nonparametric
stochastic frontiers: a local likelihood approach”, Discussion paper 0417, Institut de
Statistique, UCL, forthcoming in Journal of Econometrics.
REFERENCES 233
[188] Kuosmanen, T. and Post, G.T. (2001), “Measuring Economic Efficiency with Incom-
plete Price Information: With an Application to European Commercial Banks”, Euro-
pean Journal of Operational Research, 134 (1), 43-58.
[189] Land, K.C., Lovell C.A.K., and Thore S. (1993), “Chance- Constrained Data Envel-
opment Analysis”, Managerial and Decision Economics, 14 (6), 541-554.
[190] Laredo, P., Mustar P. (eds), (2001), Research and Innovation policies in the new global
economy. An international comparative analysis, Edward Elgar.
[192] Leibenstein, H. (1975), “Aspects of the X-efficiency theory of the firm”, Bell Journal
of Economics, 6, 580-606.
[193] Leibenstein, H. (1976), Beyond economic man. Cambridge, MA: Harvard University
Press.
[195] Leibenstein, H. (1987), Inside the firm. Cambridge, MA: Harvard University Press.
[196] Lemak D., P. Satish (1996), “Mutual Fund Performance and Managers’ Terms of
Service: Are There Performance Differences?”, The Journal of Investing, Winter ,pp.
59-63.
[197] Leontief, W.W. (1941), The Structure of the American Economy 1919-1939. New
York: Oxford University Press.
[198] Leontief, W.W. (1953), Studies in the Structure of the American Economy. New York:
Oxford University Press.
[199] Levin, S.G. and P. E. Stephan (1991), “Research productivity over the life cycle:
evidence for academic scientists”, American Economic Review, 81 (1), March, 114-
32.
[201] Lewison, G. (1998), “New bibliometric techniques for the evaluation of medical
schools”, Scientometrics, Vol. 41, 5-16.
[202] Li, X.B. and Reeves, G.R. (1999), “A multiple criteria approach to data envelopment
analysis”, European Journal of Operational Research, 115, 507-517.
[203] Lindsay, C.M. (1976), “A theory of government enterprise”, Journal of Political Econ-
omy, 84, 1061-1077.
[204] Lovell, C.A.K. (1993), “Production Frontiers and Productive Efficiency,” in H.O.
Fried, C.A.K. Lovell, and S.S. Schmidt, eds., The Measurement of Productive Ef-
ficiency, New York, Oxford University Press.
234 REFERENCES
[205] Lovell, C.A.K. (2001), “Future Research Opportunities in Efficiency and Productivity
Analysis”, in A. Alvarez (ed.), La Medición de la Eficiencia Productiva, Pirámide.
[207] Luenberger, D.G. (1992), “Benefit functions and duality”, Journal of Mathematical
Economics, 21, 461-481.
[208] Luwel, M. (2004),“The Use of Input Data in the Performance Analysis of R&D Sys-
tems. Potentialities and Pitfalls”, in H.F. Moed, W. Glanzel and U. Schmoch (edited
by), Handbook of Quantitative Science and Technology Research, Kluwer Academic
Publishers, 315-338.
[209] Malmquist, S. (1953), “Index Numbers and Indifference Surfaces”, Trabajos de Esta-
tistica, 4, 209-242.
[212] Martin, B.R. (1996), “The use of multiple indicators in the assessment of basic re-
search”, Scientometrics, Vol. 36, 343.
[214] Mas-Colell A., Whinston M.D., Green J.R. (1995) Microeconomic Theory, Oxford
University Press, USA.
[215] May R. (1993), “The scientific wealth of nations”, Science, 275, 793-796.
[216] Meeusen, W., Van den Broeck J. (1977), “Efficiency Estimation from Cobb-Douglas
Production Functions With Composed Error”, International Economic Review, 18,
435-444.
[217] Merton, R.K. (1968), “The Matthew effect in science”, Science, 159 Jan-Mar, 56-63.
[218] Milgrom P., Roberts J. (1992), Economics, organization and management, Prentice
Hall, Englewood Cliffs.
[219] Moed H.F., W. Glanzel and U. Schmoch (2004)(edited by), Handbook of Quantitative
Science and Technology Research, Kluwer Academic Publishers.
[220] Moed, H.F., van Leeuwen T.N. (1996), “Impact factors can mislead”, Nature, 381:186.
[221] Moorsteen, R.H. (1961), “On measuring Productive Potential and Relative Efficiency”,
Quarterly Journal of Economics, 75 (3), 451-467.
[222] Morey, M.R., Morey, R.C. (1999), “Mutual fund performance appraisals: a multi-
horizon perspective with endogenous benchmarking”, Omega, Int. J. Mgmt Sci., 27,
241-258.
REFERENCES 235
[223] Morroni, M. (2006), Knowledge, Scale and Transactions in the Theory of the Firm,
forthcoming Cambridge University Press, Cambridge.
[224] Mouchart M., and L. Simar (2002), “Efficiency Analysis of Air Controllers: First
Insights”, Consulting Report No. 0202, Institut de Statistique, UCL, Belgium.
[225] Mullins N., Snizek W., Oehler K. (1988), The structural analysis of a scientific paper,
in Van Raan A.F.J., Handbook of Quantitative Studies of Science and Technology, pp.
81-105, North Holland, Amsterdam.
[226] Murthi, B., Choi, Y. and Desai, P. (1997), “Efficiency of Mutual Funds and Portfolio
Performance Measurement: a Nonparametric Measurement”, European Journal of
Operational Research, 98, 408-418.
[227] Nadaraya, E.A. (1964), “On estimating regression”, Theory of Probability Applica-
tions, 9, 141-142.
[229] Narin, F., Hamilton, K.S. (1996), “Bibliometric performance measures”, Scientomet-
rics, Vol. 36, 293-310.
[230] Narin F., Olivastro D., Stevens K. A. (1994), “Bibliometrics - Theory, Practice and
Problems”, Evaluation Review, Vol. 18, n. 1.
[231] Niskanen, W.A. Jr. (1971), Bureaucracy and representative government. Chicago,
Aldine Press.
[232] Okubo Y. (1997), “Bibliometric indicators and analysis of research systems: methods
and examples”, STI Working Papers 1997/1, OECD, Paris.
[233] Olesen, O.B., and Petersen, N.C. (1995), “Change constrained Efficiency Evaluation”,
Management Science, 41(3), 442-457.
[234] Park, B.U. and Simar, L. (1994), “Efficient Semiparametric Estimation in a Stochastic
Frontier Model”, Journal of the American Statistical Association, 89,no. 427, 929-935.
[235] Park, B.U. Sickles, R.C., and Simar, L. (1998), “Stochastic panel frontiers: A semi-
parametric approach”, Journal of Econometrics, 84, 273-301.
[236] Park, B.U., R. Sickles and L. Simar (2003a), “Semiparametric Efficient Estimation of
AR(1) Panel Data Models”, Journal of Econometrics, vol 117, 2, 279-309. Corrigen-
dum to “Semiparametric-efficient estimation of AR(1) panel data models”, Journal of
Econometrics, vol 117, 2, 311.
[237] Park, B.U., R. Sickles and L. Simar (2003b), “Semiparametric Efficient Estimation in
Dynamic Panel Data Models”, Discussion Paper 0315, Institut de Statistique, UCL,
forthcoming in Journal of Econometrics.
[238] Park, B.U., Simar, L. and Weiner C. (2000), “The FDH Estimator for Productivity
Efficiency Scores: Asymptotic Properties”, Econometric Theory, 16, 855-877.
236 REFERENCES
[239] Park, B.U., L. Simar and V. Zelenyuk (2006), “Local Likelihood Estimation of Trun-
cated Regression and Its Partial Derivatives: Theory and Application”. Discussion
paper 0606, Institut de Statistique, UCL.
[240] Pedraja-Chaparro, R., Salinas-Jimenes, J., Smith, J. and Smith, P. (1997), “On the role
of weight restrictions in DEA”, Journal of Productivity Analysis, 8, 215-230.
[241] Perelman S., Santin D. (2005), “Measuring educational efficiency at student level with
parametric stochastic distance functions: An application to Spanish PISA results”,
paper presented at the X EWEPA, Brussels, June-July 2005.
[243] Porter G., J. Trifts, (1998) “Performance Persistence of Experienced Mutual Fund
Managers”, Financial Services Review, 7 (1), 57-68.
[244] Pyke, F. Becattini G., Sengenberger W. (1986), “Industrial districts and inter-firm
co-operation in Italy”, International Labour Office, Geneve.
[245] Ramsden P. (1994), “Describing and explaining research productivity”, Higher Edu-
cation, Vol. 28.
[246] Ray, S.C. and Bhadra D. (1993), “Nonparametric Tests of Cost Minimizing Behavior:
A Study of Indian Farms”, American Journal of Agricultural Economics, 73 (Nov),
990-999.
[247] Ray, S.C. (2004), Data Envelopment Analysis, Theory and Techniques for Economics
and Operations Research, Cambridge University Press, US.
[249] Ritter, C. and Simar, L. (1997), “Pitfalls of Normal-Gamma Stochastic Frontier Mod-
els”, Journal of Productivity Analysis, 8(2), 167-182.
[250] Rosenberg N. (1991), “Critical issues in science policy research”, Science and Public
Policy, Vol. 18, n. 6, pp. 335-346.
[251] Rousseau, S. and Rousseau, R. (1997), “Data Envelopment Analysis as a tool for
constructing scientometric indicators”, Scientometrics, Vol. 40, 45-56.
[252] Rousseau, S. and Rousseau, R. (1998), “The scientific wealth of European nations:
taking effectiveness into account”, Scientometrics, Vol. 42, 75-87.
[253] Russell, RR. (1985), “Measures of Technical Efficiency”, Journal of Economic Theory,
35 (1), 109-126.
[254] Russell, RR. (1988), “On the Axiomatic Approach to the Measurement of Technical
Efficiency”, in W. Eichhorn, ed. (1988), Measurement in Economics: Theory and
Applications of Economic Indices, Heidelberg: Physica-Verlag.
[257] Saxenian A. (1996), Regional advantage. Culture and competition in Silicon Valley
and Route 128, Boston, Harvard University Press.
[258] Scherer, F.M. (1980), Industrial market structure and economic performance,
Houghton Mifflin, Boston.
[259] Schmidt, P. (1976), “On the Statistical Estimation of Parametric Frontier Production
Functions”, Review of Economics and Statistics, 58, 238-239.
[260] Schmidt P. and Sickles R. C. (1984), “Productivity Frontiers and Panel Data”, Journal
of Business and Economic Statistics, 2, 367-374.
[262] Schubert A., Braun T. (1993), “Reference standards for citation based assessments”,
Scientometrics, Vol. 26, n. 1, pp. 21-35.
[264] Schubert A., Glanzel W., Braun T. (1988), “Against absolute methods: relative sci-
entometric indicators and relational charts as evaluation tools”, in Van Raan A.F.J.,
Handbook of Quantitative Studies of Science and Technology, pp. 137-176.
[265] Schuster, E.F. (1985), “Incorporating Support Constraints into Nonparametric Estima-
tors of Densities”, Communication in Statistics - Theory and Methods, 14, 1123-1136.
[266] Scott D. W. (1992), Multivariate Density Estimation, Theory, Practice and Visualiza-
tion, John Wiley & Sons, NY.
[267] Scott, A.J. (ed.) (2001) Global city-Regions. Oxford, Oxford University Press.
[268] Seaver, B. and Triantis, K. (1992), “A Fuzzy Clustering Approach Used in Evaluating
Technical Efficiency Measures in Manufacturing”, Journal of Productivity Analysis,
3, 337-363.
[269] Seglen, P.O. (1997), “Why the impact factor of journals should not be used for evalu-
ating research”, BMJ, 314: 498-502.
[270] Seiford, L.M. (1994), “A DEA bibliography 1978-1992”, in Charnes A., Cooper W.W.,
Lewin A., and Seiford L. (eds.), Data Envelopment Analysis: Theory, Methodology,
Applications, Kluwer Academic Publishers, 437-470.
[271] Seiford, L.M. (1996), “Data Envelopment Analysis: The Evolution of the State-of-
the-Art (1978-1995)”, Journal of Productivity Analysis, 7, 99-138.
[272] Seitz J. K.(1966), “Efficiency Measures for Steam-Electric Generating Plants”, Pro-
ceedings of the Thirty Ninth Annual Meeting of the Western Farm Economics Associ-
ations, 1966, 143-151.
238 REFERENCES
[274] Sengupta J.K.(1991), “Maximum Probability Dominance and Portfolio Theory”, Jour-
nal of Optimization Theory and Applications, 71, 341-357.
[275] Sengupta, J.K. (1992), “A Fuzzy Systems Approach in Data Envelopment Analysis”,
Computers and Mathematical Applications, 24, 259-266.
[276] Sengupta, J.K., and Park, H.S. (1993), “Portfolio Efficiency tests Based on Stochastic
Dominance and Cointegration”, International Journal of Systems Science, 24, 2135-
2158.
[277] Sengupta, J.K. (1994), “Measuring Dynamic Efficiency Under Risk Aversion”, Euro-
pean Journal of Operational Research, 74, 61-69.
[278] Sengupta, J.K. (1995), Dynamics of Data Envelopment Analysis. Theory of Systems
Efficiency, Kluwer Academic Publishers, Dordrecht.
[279] Sengupta, J.K. (2000), Dynamic and Stochastic Efficiency Analysis, Economics of
Data Envelopment Analysis, World Scientific, Singapore.
[280] Sheather S.J., and Jones M.C. (1991), “A relyable data-based bandwidth selection
method for kernel density estimation”, Journal of the Royal Statistical Society, Series
B, 53:3, pp. 683-690.
[281] Shephard, R.W. (1953). Cost and Production Functions. Princeton, NJ: Princeton
University Press.
[282] Shephard, R.W. (1970). Theory of Cost and Production Function. Princeton, NJ:
Princeton University Press.
[283] Shephard, R.W. (1974). Indirect Production Functions. Princeton, NJ: Princeton Uni-
versity Press.
[285] Sickles R. C. (2005), “Panel estimators and the identification of firm-specific effi-
ciency levels in parametric, semiparametric and nonparametric settings”, Journal of
Econometrics, 126, 305-334.
[286] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman
and Hall, London.
[287] Silverman B.W., and Young G.A. (1987), “The Bootstrap: Smooth or Not to Smooth?”,
Biometrika, 74, 469-479.
[288] Simar L., (1992), “Estimating Efficiencies from Frontier Models with Panel Data: A
comparison of Parametric, Nonparametric and Semi-parametric Methods with Boot-
strapping”, The Journal of Productivity Analysis, 3, 167-203.
[289] Simar L., (1996) “Aspects of statistical Analysis in DEA-type frontier models”, The
Journal of Productivity Analysis, 7, 177-185.
REFERENCES 239
[290] Simar, L. (2003a), “Detecting outliers in frontier models: a simple approach”, Journal
of Productivity Analysis, 20, 391-424.
[291] Simar, L. (2003b), “How to Improve the Performance of DEA/FDH Estimators in the
Presence of Noise?, Discussion Paper 0323, Institut de Statistique, UCL, Belgium.
[292] Simar, L. and Wilson, P.W. (1998), “Sensitivity analysis of efficiency scores: how to
bootstrap in nonparametric frontier models”, Management Science, vol. 44, 1, 49-61.
[293] Simar, L. and Wilson, P.W. (1999a), “Some problems with the Ferrier/Hirschberg
Bootstrap Idea”, The Journal of Productivity Analysis, 11, 67-80.
[294] Simar, L. and Wilson, P.W. (1999b), “Of Course we Can Bootstrap DEA scores!
But does it mean anything? Logic Trumps and Wishful Thinking”, The Journal of
Productivity Analysis, 11, 67-80.
[295] Simar, L. and Wilson, P.W. (1999c), “Estimating and Bootstrapping Malmquist In-
dices”, European Journal of Operational Research, 115, 459-471.
[296] Simar, L. and Wilson, P.W. (2000a), “Statistical Inference in Nonparametric Frontier
Models: The State of the Art”, The Journal of Productivity Analysis, 13, 49-78.
[297] Simar, L. and Wilson, P.W. (2000b), “A general methodology for bootstrapping in
non-parametric frontier models”, Journal of Applied Statistics, vol.27, 6, 779-802.
[298] Simar, L. and Wilson, P.W. (2001), “Testing restrictions in nonparametric efficiency
models”, Communications in Statistics, 30(1), 159-184.
[299] Simar, L. and Wilson, P.W. (2002), “Nonparametric tests of returns to scale”, European
Journal of Operational Research, 139, 115-132.
[300] Simar, L. and Wilson, P.W. (2003), “Estimation and Inference in Two-stage, Semi-
parametric Models of Production Processes”, Discussion Paper 0307, Institut de Sta-
tistique, UCL, Belgium, forthcoming in Journal of Econometrics.
[301] Simar, L. and P.W. Wilson (2005), “Estimation and Inference in Cross-Sectional Sto-
chastic Frontier Models”, Discussion paper 0524, Institut de Statistique, UCL.
[302] Simar, L. and P.W. Wilson (2006a), “Statistical Inference in Nonparametric Frontier
Models: recent Developments and Perspectives”, forthcoming in The Measurement
of Productive Efficiency, 2nd Edition, Harold Fried, C.A.Knox Lovell and Shelton
Schmidt, editors, Oxford University Press, 2006.
[303] Simar, L. and Wilson, P.W. (2006b), “Efficiency Analysis: The Statistical Approach”,
Manuscript, Institute of Statistics, UCL, Belgium.
[304] Simar, L. and V. Zelenyuk (2003), “Statistical Inference for Aggregates of Farrell-type
Efficiencies”, Discussion paper 0324, Institute of Statistics, UCL, Belgium, forthcom-
ing in Journal of Applied Econometrics.
[305] Simar, L. and V. Zelenyuk (2004), “On Testing Equality of Distributions of Technical
Efficiency Scores”, Discussion paper 0434, Institute of Statistics, UCL, Belgium.
[307] Simon, H. A. (1957), Models of Man, John Wiley and Sons, NY.
[308] Simonoff J.S., (1996), Smoothing methods in statistics, Springer series in Statistics,
NY.
[309] Sitorius, B.L. (1966), “Productive Efficiency and Redundant Factors of Production
in Traditional Agricolture of Underdeveloped Countries”, Proceedings of the Thirty
Ninth Annual Meeting of the Western Farm Economics Associations, 153-158.
[310] Schmidt, P, and R.C. Sickles (1984), Production frontier and panel data, Journal of
Business and Economic Statistics,3, 171-203.
[311] Stevenson, R.E. (1980), “Likelihood Functions for Generalized Stochastic Frontier
Estimation”, Journal of Econometrics, 13(1), 57-66.
[312] Stigler, G.J. (1976), “The Xistence of X-Efficiency”, American Economic Review, 66
(1), 213-216.
[313] Swiss Re, (1996), “Deregulation and Liberalization of Market Access: The European
Insurance Industry on the Threshold of a New Era in Competition,” Sigma, no. 7 of
1996.
[314] Swiss Re, (2000a), “Japan’s Insurance Markets - A Sea Change,” Sigma, no. 8 of 2000.
[315] Swiss Re, (2000b), “Europe in Focus: Non-life Markets Undergoing Structural
Change,” Sigma, no. 3 of 2000.
[316] Taubes G. (1993), “Measure for measure in science”, Science, 14/05/93, Vol. 260, n.
5110, pp. 884-886.
[318] Thanassoulis, E. (2001) Introduction to the Theory and Application of Data Envelop-
ment Analysis, Kluwer Academic Publishers, Boston.
[319] Thrall, R.M. (1999), “What is the economic meaning of FDH?”, Journal of Produc-
tivity Analysis, 11, 243-250.
[320] Thursby J.G., and Kemp S. (2002), “Growth and productive efficiency of university
intellectual property licensing”, Research Policy, 31(1), 109-124.
[321] Timmer, C.P. (1971), “Using a Probabilistic Frontier Production Function to Measure
Technical Efficiency”, Journal of Political Economy, 79 (4), 776-794.
[322] Tornqvist, L. (1936), “The Bank of Finland’s Consumption Price Index”, Bank of
Finland Montly Bullettin, 10, 1-8.
[323] Treynor, J.L. (1965), “How to Rate Management of Investment funds”, Harvard Busi-
ness Review, 43, 63-75.
[324] Triantis, K. and Girod, O. (1998), “A Mathematical Programming Approach for Mea-
suring Technical Efficiency in a Fuzzy Environment”, Journal of Productivity Analysis,
10, 85-102.
REFERENCES 241
[325] Triantis, K and Vanden Eeckaut P. (2000), “Fuzzy Pair-wise Dominance and Impli-
cations for Technical Efficiency Performance Assessment”, Journal of Productivity
Analysis, 13, 207-230.
[326] Tulkens, H. (1993), “On FDH Efficiency Analysis: Some methodological Issues and
Applications to Retail Banking, Courts, and Urban Transit”, Journal of Productivity
Analysis, 4 (1/2), 183-210.
[329] Turchetti G. and Daraio, C. (2004), “How Deregulation Shapes Market Structure and
Industry Efficiency: The case of the Italian Motor Insurance Industry”, Geneva Papers
on Risk and Insurance, 29 (2), 202-218.
[330] van Raan, A.F.J. (1993), “Advanced bibliometric methods to assess research perfor-
mance and scientific development: basic principles and recent practical applications”,
Research Evaluation, 3:151.
[331] van Raan, A.F.J. (1997), “Scientometrics: state of the art”, Scientometrics, Vol. 38,
205-218.
[332] van den Broeck J., Koop G., Osiewalski J., and Steel M.F.J. (1994), “Stochastic frontier
models: a Bayesian perspective”, Journal of Econometrics, 61, 273-303.
[333] Vanden Eeckaut, P. (1997), Free Disposal Hull and Measurement of efficiency: Theory,
Application and Software, PhD Thesis, Faculté des Sciences Economiques, Sociales
et Politiques, Nouvelle série (229), Université Catholique de Louvain, Louvain-la-
Neuve, Belgium.
[334] Varian, H.R. (1984), “The Non-Parametric Approach to Production Analysis”, Journal
of Productivity Analysis, 52, 279-297.
[335] Varian, H.R. (1985), “Nonparametric Analysis of Optimizing Behaviour with Mea-
surement Error”, Journal of Econometrics, 30 (1/2), 445-458.
[336] Varian, H.R. (1990), “Goodness-of-Fit in Optimizing Models”,in Lewin, A.Y., and
Lovell, C.A.K., (eds.) Frontier Analysis: Parametric and Nonparametric Approaches,
Journal of Econometrics, 46 (1/2).
[337] Varian H.R. (1992) Microeconomic Analysis, 3rd edition, W. W. Norton & Company.
[339] Watson, G.S. (1964), “Smooth regression analysis”, Sankhya Series A, 26, 359-372.
[340] Wilkens, K., J. Zhu (2001), “Portfolio evaluation and benchmark selection: A mathe-
matical programming approach”, Journal of Alternative investments, 4 (1), 9-20.
242 REFERENCES
[341] Williamson, O.E. (1964), The Economics of Discretionary Behavior: Managerial Ob-
jectives in a Theory of the Firm, Englewood Cliffs, NJ: Prentice-Hall.
[342] Wilson, P.W. (1995), “Detecting Influential Observations in Data Envelopment Analy-
sis”, Journal of Economics and Business, 6, 27-46.
[343] Wilson, P. W. (2005a), “FEAR 1.0: A Software Package for Frontier Efficiency
Analysis with R”, unpublished working paper, Department of Economics, Uni-
versity of Texas, Austin, Texas. Software and working paper downloadable at
http://www.eco.utexas.edu/faculty/Wilson/Software/FEAR/ .
[345] Wilson, P. W. (2005c), “FEAR 0.9 User’s Guide”, Department of Economics, Univer-
sity of Texas, Austin, Texas.
[346] Zhu, J. (1996), “Data envelopment analysis with preference structure”, Journal of the
Operational Research Society, 47, 136-150.
[347] Zucker L.G., Darby M.R., Armstrong J. (1998), “Geographically localized knowledge:
Spillovers or markets?” Economic Inquiry, XXXVI, 65-86.
Topic Index
Acs, Z., 173 Cazals, C., 65–66, 68, 72, 75, 77, 96, 102
Adams, J.D., 168 Cesari, R., 193
Afriat, S.N., 17–18, 29 Chambers, R.G., 41
Aigner, D.J., 4, 29, 46 Chang, H., 19
Alchian, A., 18 Charnes, A., 2, 14–15, 17, 29, 31, 168
Allen, R., 40 Chavas, J.P., 18
Allison, P.D., 174 Cherchye, L., 37
Amel, D., 139 Chevalier, J., 195
Andersen, P., 41 Choi, B., 193
Angulo-Meza, L., 40 Christensen, L.R., 16, 28, 186
Annaert, J., 193 Chu, S.F., 29
Aragon, Y., 72, 74–75 Chung, Y., 41
Armstrong, J., 174 Clark, G.L., 173
Athanassopoulos, A., 40 Coelli, T., 26, 90, 92, 168, 186, 188
Audretsch, D.B., 174 Collins, P.M.D., 169
Avveduto, S., 175
Cooke, P., 173
Banker, R.D., 15, 18–19, 31, 47, 98
Cooper, W.W., 2, 14–15, 17, 19, 25, 29, 31, 39, 168
Barnes, C., 139
Cornwell, C., 29
Barnett, V., 81
Cox, T.L., 18
Bartelsman, E.J., 97–98
Cummins, J.D., 139, 142–143
Barth, J.R., 138
Dan Brumbaugh, R.Jr., 138
Basso, A., 193
Battese, G.E., 26 Daniel, H.D., 169
Becattini, G., 173 Dantzig, G.B., 17
Berger, A.N., 138–139, 143 Daouia, A., 5, 72, 74–75, 77, 86–88, 91, 94, 96,
Bergson, A., 17 103–104, 186, 192
Bertoletti, P., 191 Daraio, C., 6, 65–66, 70–71, 77, 96, 100–102, 113,
Bessent, A., 168 122, 144, 168–170, 176, 184, 193–194
Bhadra, D., 18 Darby, M.R., 174
Bickel, P.J., 57 David, P.A., 175
Bjurek, H., 16 De Alessi, L., 19
Bogetoft, P., 41 Debreu, G., 2, 14, 16–17, 24
Boles, J.N., 17 Deckle, R., 138
Bonaccorsi, A., 168–170, 176, 184 Deprins, D., 2, 17, 30, 33–34, 86, 145
Braun, T., 169, 178 Desai, P., 193
Bressler, R.G., 17 Detzel, L.F., 196, 200, 216
Briec, W., 41, 193–194 Diewert, W.E., 16, 18, 28
Cantarelli, D., 191 Doms, M., 97–98
Castells, M., 173 Dorfman, R., 2
Caves, D.W., 16, 28 Dorof, W.A., 169
246 AUTHOR INDEX
Van den Broeck, J., 29, 46, 193 Whinston, M.D., 191
Van Leeuwen, T.N., 179 Wilcox, J.A., 138
Van Raan, A.F.J., 169 Wilkens, K., 193
Vanden Eeckaut, P., 29, 37, 40–41 Williamson, O.E., 18
Varian, H.R., 18, 191 Wilson, P.W., 4, 6, 19, 22, 25–26, 30, 43–46, 49, 52,
Vennet, R.V., 193 57–61, 63–64, 79, 99, 113, 151–153, 218
Vincent, A., 13 Yaisawarng, S., 6, 99
Wagner, J., 196, 200, 216
Young, G.A., 57
Wallenius, J., 40, 168
Zelenyuk, V., 16, 63, 100
Watson, G.S., 114
Weber, W., 90, 92, 168, 186 Zenios, S.A., 139
Weigand, R., 196, 200, 216 Zhu, J., 39–40, 193
Weiner, C., 45, 48–49, 77 Zi, H., 143
Weiss, M.A., 139, 142–143 Zucker, L.G., 174