Volker Blobel
Volker Blobel
Keys during display: enter = next page; → = next page; ← = previous page; home = first page; end = last page (index, clickable); C-← = back; C-N = goto
page; C-L = full screen (or back); C-+ = zoom in; C-− = zoom out; C-0 = fit in window; C-M = zoom to; C-F = find; C-P = print; C-Q = exit.
1. Global analysis of data from HEP experiments
Systematic errors are at the origin of the unsatisfactory situation, when data from many experiments
are used by theoreticians in a global analysis and parameter estimation, and when attempts are made
to determine e.g. uncertainties of predictions from parton distribution functions.
• It is difficult to estimate systematic errors correctly, and the estimated errors are only as good
as the model used for systematic errors.
• It is difficult to construct a correct chisquare expression for the estimation of parameters, taking
into account the systematic contributions.
• The construction of the chisquare expression is often done incorrectly and, if published in a
respected journal, it is often the model for the next generation of scientists.
The situation is e.g. described in the paper: D. Stump et al., Phys. Rev. D65, 014012
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 2
Examples with large ∆χ2
The large, artificial and arbitrary magnification of errors is hardly acceptable – the procedure points
to a deep problem in the whole data analysis. Two examples from parton distribution fits:
W production at the Tevatron αS (MZ2 )
Both curves are parabolas to a very good approximation over a range of ∆χ2 > 100 . . .
. . . while usually one would consider only a range of ∆χ2 ≈ 4, corresponding to two standard deviations.
“ . . . determine the increase in χ2global that corresponds to our estimated uncertainty ∆σ W in the σ W
prediction. . . . corresponds to ∆χ2global ≈ 180.”
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 3
Data uncertainties in HEP experiments
√
Statistical errors: The statistical error are given by the Poisson statistics of event numbers (n± n),
without correlations between bins . . .
. . . but numbers corrected for finite resolution are correlated =⇒ Correlated statistical errors.
Normalization error: There is an uncertainty in the factor used to convert event numbers to
cross sections. This so-called normalization error applies to cross sections and to statistical
errors of cross sections as well. Because of its origin – product of many factors, each with some
uncertainty – the factor perhaps follows a log-normal distribution due the multiplicative central
limit theorem.
Systematic errors: There are uncertainties in the detector behaviour, e.g. energy measurements
by a calorimeter may have a general relative error of a few %. Often the experiments analysis is
repeated with ± few % relative change of e.g. calorimeter data; from the change in the result,
error contributions for all data are related to the single systematic uncertainty. A single error
contribution is a rank-1 contribution to the covariance matrix of the data.
Correlated statistical errors, unfolding error: The smearing effect of the detector is usually
corrected for in a way, which, due to the smoothing aspect, introduces at least positive correlations
between neightboring points.
Several different error contributions are reported by the experiments, but some may be missing; the
unfolding error (with positive correlations) usually remains unpublished.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 4
2. Normalization errors (multiplicative)
Normalization error: There is an uncertainty in the factor used to convert event numbers to
cross sections. This so-called normalization error applies to cross sections and to statistical
errors of cross sections as well. Because of its origin – product of many factors, each with some
uncertainty – the factor perhaps follows a log-normal distribution due the multiplicative central
limit theorem.
From a paper: “. . . Then, in addition, the fully correlated normalization error of the experiment is
usually specified separately. For this reason, it is naturally to adopt the following definition for the
effective χ2 (as done in previous . . . analyses)
X
χ2global = wn χ2n (a) (n labels the different experiments),
n
2
1 − Nn
2 X Nn Dn` − Tn` (a)
χ2n (a) = +
D
σnN `
σn`
D
For the nth experiment, Dn` , σn` , and Tn` (a) denote the data value, measurement uncertainty
(statistical and systematic combined ) and theoretical value (dependent on {a}) for the `th data point,
σnN is the experimental normalization uncertainty, and Nn is an overall normalization factor (with
default value 1) for the data of experiment n. ...”
The nuisance parameter Nn should be applied as a factor to Tn` (a), instead of Dn` , or
D
alternatively to both Dn` and σn` ; otherwise a normalization bias is introduced.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 5
Common normalisation errors example from a publication
“Data are y1 = 8.0 ± 2% and y2 = 8.5 ± 2%, with a common (relative) normalisation error of ε = 10%.
The mean value (constraint yb1 = yb2 ) resulting from a χ2 minimisation of χ2 = ∆T V −1 ∆ is:
y1 − y
y = 7.87 ± 0.81 i.e. < y1 and < y2 ∆=
y2 − y
“. . . that including normalisation errors in the correlation matrix will produce a fit which
is biased towards smaller values . . . the effect is a direct consequence of the hypothesis to
estimate the empirical covariance matrix, namely the linearisation on which the usual error
propagation relies.”
But the matrix V a is wrong! Correct model: the normalisation errors ε · value are identical
2 2 2 2
σ1 0 2 y y σ1 + ε2 y 2 ε2 y 2
Vb= +ε · 2 2 =
0 σ22 y y ε2 y σ22 + ε2 y 2
Plot of one measured value vs. the other measured value, with the assumed covariance ellipse; the
mean value is on the diagonal.
10 10
8 8
6 6
6 8 10 6 8 10
Axis of ellipse is tilted w.r.t. the diagonal and el- Axis of the ellipse is ≈ 45◦ and ellipse touches the
lipse touches the diagonal at a biased point. diagonal at the correct point.
The result of χ2 minimisation may depend critically on details of the model implementation!
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 7
The method with a nuisance parameter . . .
Another method often used is to define
X (f · yk − y)2 (f − 1)2
χ2a = 2
+ 2
,
k
σk ε
8
The χ2 definition for this problem
chi square(b)
X (yk − f · y)2 (f − 1)2
2
χb = +
chi square
k
σk2 ε2
6
blue curve.
4
6 7 8 9
mean Y
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 8
The log-normal distribution . . . and the normalisation error
The normalisation factor determined in an experiment is more the product (luminosity, detector accep-
tance, efficiency) than the sum of random variables. According to the multiplicative central limit theorem
the product of positive random variables follows the log-normal distribution, i.e. the logarithm of the
normalisation factor follows the normal distribution.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 9
Aymmetric normalisation errors
Proposed method, to take the normalisation error ε into account, if data from > 1 experiment are
combined:
Introduce one additional factor α for each experiment as nuisance parameter, which has been measured
to be α = 1 ± ε, modify the expectation according to
fi = α · f (xi , a)
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 10
3. Additive systematic errors
Systematic errors: There are uncertainties in the detector behaviour, e.g. energy measurements
by a calorimeter may have a general relative error of a few %. Often the experiments analysis is
repeated with ± few % relative change of e.g. calorimeter data; from the change in the result,
error contributions for all data are related to the single systematic uncertainty. A single error
contribution is a rank-1 contribution to the covariance matrix of the data.
Experimental method: e.g. run MC for each systematic (Unisim) with constant varied by 1 σ and
redetermine result – determine signed shifts si of data values yi .
1. Method: Modify covariance matrix to include contribution(s) due to systematic errors
X
V a = V stat + V syst with V syst = ssT (rank-1 matrix) or sk sTk
k
and use inverse matrix V −1 as weight matrix in the χ2 function. (→ simplified calculation of inverse)
2. Method (recommended): Introduce one nuisance parameter β, assumed to be measured as
0 ± 1, for each systematic error source, and make fit with
X (yi + βsi − α · f (xi , a))2
S(a) = + β2
i
σi2
Other method: Multisim: vary all systematic parameters randomly using their assumed probability distribution and
redetermine result.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 11
Simplified calculation of inverse
Assume that the inverse A−1 of the n-by-n matrix A is known. If a small change in A is done in one
of the two forms below, the corresponding change in A−1 is calculated faster by the formulas below.
Woodbury Formula: U and V are n-by-k matrices with k < n and usually k n.
h i
T −1
−1 −1 T −1
−1 T −1
A + UV =A − A U 1+V A U V A
W.H.Press et al., Numerical Recipes, The Art of Scientific Computing, Cambridge University Press
“For larger k the direct methods may be faster and more accurate because of the stabilizing advantages
of pivoting.”
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 12
Example from an experimental paper
Systematic uncertainties:
P
• “Correlated systematic uncertainties:” e.g. −0.2% − 0.7% + 1.0% + 0.9% − 0.6% . . . = 1.6%
– electron energy
– electron angle
– hadronic calibration
– calorimeter noise contribution
– photoproduction background
P
• “Uncorrelated systematic uncertainties:” e.g. 0.4% 0.8% . . . = 2.1%
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 13
Example from analysis papers
“All the experiment included in our analysis provide full correlated systematics, as well as normalization
errors. The covariance matrix can be computed from these as
Nsys
!
X
2 2
covij = σi,k σj,k + Fi Fj σN + δij σi,t ;
k=1
where Fi , Fj are central experimental values, σi,k are the Nsys correlated systematics, σN is the total
normalization uncertainty, and the uncorrelated uncertainty σi,t is the sum of the statistical uncertainty
σi,s and the Nu uncorrelated systematic uncertainties (when present) . . . ”
The inverse of the covariance matrix cov above is used as a weight matrix for a χ2 calculation.
“ . . . However . . . correlations between measurement errors, and correlated theoretical errors, are not
included in its definition.”
“ . . . Instead, the evaluation of likelihoods and estimation of global uncertainty will be carried out
. . . after sets of optimal sample PDF’s for the physical variable of interest have been obtained.”
Comments: The quadratic combination of statistical and systematic measurement uncertainty neglects
the known correlation, inherent in the systematic effect. Neither unbiased optimal parameters values
nor a usable χ2 or parameter errors can be expected from the χ2 function.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 14
4. Correlated statistical errors
√
Statistical deviations per bin of measured quantities from event numbers (n ± n) are independent.
The covariance matrix is diagonal.
but
“. . . the selected event samples are corrected for detector acceptance and migration using
the simulation and are converted to bin-centred cross sections. . . . The bins used in the
measurement are required to have stability and purity larger than 30 %. . . . The stability
(purity) is defined as the number of simulated events which originate from a bin and which
are reconstructed in it, divided by the number of generated (reconstructed) events in that
bin . . . ”
• The covariance matrix for corrected data is non-diagonal and the variances are magnified. This
is visible in the eigenvalue spectrum of the migration matrix.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 15
Why the cov matrix is non-diagonal . . .
10
0.5
40
binwidth = sigma
20
0 0
0 1 2 0 2 4 6 0 50 100
sigma/binwidth Index of eigenvalue
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 16
5. Parameter uncertainties: profile likelihoods and χ2
Covariance matrix: parameter uncertainties and correlations are given by the covariance matrix V
of the fit, obtained by inversion of the matrix of second derivatives (Hessian) of the log-likelihood
function (Fishers Information I = V −1 ).
The covariance matrix is usually assumed to be sufficient to describe the parameter uncertainties.
χ2 contour: the surface of the error ellipsoid corresponds to the area of χ2mininum + 1.
Function of parameters: the uncertainty of functions g(a) of the parameters is determined by the
error propagation formula (derivatives of g(a) w.r.t. the parameters and covariance matrix V .
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 17
Profile likelihood for a function
The profile likelihood for a function g(a) of parameters, e.g. the W production cross section σ W at
the Tevatron, or αS (MZ2 ) can be calculated by the use of the Lagrange multiplier method (Lagrange
1736-1813):
For many fixed values gfix of the function g(a) the likelihood function is minimized. The standard
method is to define a Lagrange function
and to find the stationary point w.r.t the parameters a and the Lagrange multiplier λ, given gfix . The
constraint defines a set of parameter values for each value of gfix , e.g. σW .
S(a) + λ · g(a)
and after minimization, to calculate the corresponding fixed g(a) (allows to use MINUIT).”
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 18
Comments to a recent paper
Global fit: The overall (“global”) fit is done, taking into account the normalization errors, but
neglecting certain systematic error contributions (non-diagonal) in the experimental data.
Why are the non-diagonal error contributions neglected, but later used?
Profile χ2 -function: In the determination of the profile χ2 -function however the normalization is
fixed at the previously fitted value.
Fixing certain parameters will not result in a correct profile.
Single experiment analysis: Afterwards a χ2 -analysis is done, separately for each experiment,
now taking into account the systematic error contributions (non-diagonal) in the experimental
data.
For each experiment the “profile” χ2 -function is evaluated, however with the parameters from
the global fit.
Each experiment has a different parameter covariance matrix, depending on the
kinematical region and the accuracy. Evaluating the χ2 -function using parameter
sets from the global fit will not result in a correct profile.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 19
From papers . . .
“In full global fit art in choosing correct” ∆χ2 given complication of errors. Ideally ∆χ2 = 1,
but unrealistic.”
“ . . . and ∆χ2 is the allowed variation in χ2 . . . . and a suitable choice of ∆χ2 . . . and ∆χ2
is the allowed deterioration in fit quality for the error determination.”
“. . . Our standard PDF set S0 is a parametrized fit to 1295 data points with 16 fitting
parameters. The minimum of χ2global is approximately 1200. Naively, it seems that an
increase of χ2global by merely 1, say from 1200 to 1201, could not possibly represent a
standard deviation
√ of the fit. Naively one might suppose that a standard deviation would
2
have ∆χ ∼ 1295 rather than 1. However this is an misconception. If the errors are
uncorrelated (or if the correlations are incorporated into χ2 ) then indeed ∆χ2 = 1 would
represent a standard deviation. But this theorem is irrelevant to our problem, because the
large correlations of systematic errors are not taken into account in χ2global . . . . ”
(Phys. Rev.)
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 20
Use of redundant parameters
where A− ∼ 0.2, δ− ∼ 0.3 and η− fixed at ∼ 10. A change of δ− changes both shape and normalisation.
“We found our input parameterization was sufficiently flexible to accomodate data, and
indeed there is a certain redundancy evident.”
In the case of highly correlated, redundant parameters the Hessian will be (almost) sin-
gular, inversion may be impossible and the convergence of the fit is doubtful. Redundant
parameters have to be avoided!
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 21
6. Outliers and their influence on the fit
“Everyone believes in the normal law of errors, the experimenters because they think it
is a mathematical theorem, the mathematicians because they think it is an experimental
fact.” [Poincaré]
Outliers – single unusual large or small values among a sample – are dangerous and will usually,
because of their large influence, introduce a bias in the result:
A method for outlier treatment: M-estimation, closely related to the maximum-likelihood method.
For data with a probability density pdf(z) the method of maximum-likelihood requires to minimize
n
X n
X
S(a) = − ln pdf(zi ) = ρ(zi )
i=1 i=1
1 2
with ρ(z) = ln pdf(z). For a Gaussian distribution ρ(z) = 2
z . The function ρ(z) is modified in
M-estimation by down-weighting.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 22
M-estimates
yi − f (xi ; a)
Abbreviation zi = (∼ N (0, 1) for Gaussian measurement)
σi
X1 X yi − f (xi ; a) ∂f
Least-squares: minimize zi2 solve =0 j = 1, 2 . . . p
i
2 i
σi2 ∂aj
X X yi − f (xi ; a) ∂f
M-estimates: minimize ρ (zi ) solve w(zi ) =0 j = 1, 2 . . . p
i i
σi2 ∂aj
dρ
with influence function ψ(z) = and with additional weight w(z) = ψ(z)/z
dz
1
ρ(z) = z 2 ψ(z) = z w(z) = 1 Case of least-squares
2
Requires iteration (non-linearity(!)) e.g with weight w(z), calculated from previous values.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 23
Influence functions and weights . . . as a function of z
2
y − f (x)
z≡
σ
least squares
least squares
1
10
Tukey
0.5
Cauchy
5
Tukey
Cauchy
Tukey
0 -2 0
-10 0 10 -10 0 10 -10 0 10
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 24
Commonly used M-estimators
1 2
Least squares = z =z =1
2
c2 z 1
= ln a + (z/c)2
Cauchy = =
2 1 + (z/c)2 1 + (z/c)2
( ( ( (
3 2 2
if |z| ≤ c c2 /6 1 − [1 − (z/c)2 ] z [1 − (z/c)2 ] [1 − (z/c)2 ]
Tukey = = =
if |z| > c c2 /6 0 0
( ( ( (
if |z| ≤ c z 2 /2 z 1
Huber = = =
if |z| > c c (|z| − c/2) c · sign (z) c/|z|
M-estimation: reduces the influence of outliers and improves fitted parameter values and uncertain-
ties . . . but the final χ2 -function values does not follow the standard χ2 distribution.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 25
Conclusion
Statistical errors of experimental data: The published statistical errors are often too optimistic,
because correlations (especially between neighbour bins) are neglected.
Construction of χ2 -function: Each error contribution should be taken into account with the
correct underlying model of the error contribution.
Volker Blobel Banff: Statistical Inference Problems in High Energy Physics and Astronomy page 26
Contents
1. Global analysis of data from HEP experiments 2
Examples with large ∆χ2 . . . . . . . . . . . . . . . . . . . 3
Data uncertainties in HEP experiments . . . . . . . . . . . 4
Conclusion 26
Table of contents 27