An Introduction To Robust Estimation With R Functi Removed
An Introduction To Robust Estimation With R Functi Removed
• w.x: x-based prior weights for Mallows estimation (default is a vector of 1s);
• maxiter, mytol: maximum number of iterations and tolerance for the algorithm.
• se, V: estimated standard errors and asymptotic variance matrix for the regression
coefficients;
where d(u; y) = −y log F (u) − (1 − y) log{(1 − u)}, ρk is a bounded function and C(xTi β)
is a bias correction term. Bianco and Yohai (1996) proposed the following ρ function,
2
x − x
if x ≤ k
ρk (x) = k 2k
otherwise
2
46
but stressed that other choices are possible. Croux and Haesbroeck (2003) extend the
Bianco and Yohai estimator by including weights for downweighting high-leverage points,
thus defining a bounded-influence estimator
n
% ) *
β̂W BY = argmin w(xi) ρk (d(xTi β; yi)) + C(xTi β) . (10)
β i=1
They suggested a decreasing function of robust Mahalanobis distances for the weights
w(xi ), where the distances are computed using the Minimum Covariance Determinant
(MCD) estimator (see Rousseeuw and Leroy, 1987). More precisely, w(xi ) are obtained as
follows. The MCD method seeks h points whose covariance has minimum determinant,
and Croux and Haesbroeck (2003) suggest h = 3/4 n, giving a 25% breakdown point
estimator. The method is implemented by the function cov.rob (or cov.mcd, which is a
convenient wrapper) in the MASS library. If X is the design matrix, the robust estimate of
multivariate location and scale are obtained by
Once the robust estimate of location and scale have been computed, we can obtain the
robust distances RDi . Finally the weights are defined as w(xi ) = W (RDi ), where W is
the weight function W (t) = I{t2 ≤χ2p,0.975 } , with IA denoting the indicator function of the
set A.
Notice that the same weights could also be used in (8). Croux and Haesbroeck (2003) have
implemented both the Bianco and Yohai estimator and their weighted version in some
public-domain S-PLUS functions, which can be also used in R with a few changes. The
two functions are BYlogreg and WBYlogreg, respectively, which we slightly modified in
order to deal with dummy covariates in the design matrix. The arguments of the function
BYlogreg are
• x0cont: design matrix to use for computing the weighted MLE (if initwml=T)
• initwml: logical value for selecting one of the two possible methods for computing
the initial value: if initwml=T a weighted MLE, otherwise the classical MLE.
The functions returns a list, including the components coef and sterror for parameter
estimates and standard errors. The function WBYlogreg has exactly the same arguments,
with the exception of initwml as the weighted MLE is always used as starting point.
47
Example: Food Stamp Data
This is a classical example of robust statistics, see for example Künsch et al. (1989). The
food stamp data set, of sample size 150, consists of a binary response variable participa-
tion in the US Food Stamp Program. The covariates included in the model are tenancy
(Tenancy), supplemental income (SupInc), and log(monthly income + 1) (log(Inc+1)).
The data are contained in the file foodstamp.txt. Let us start the analysis by getting
the MLE.
> food <- read.table("foodstamp.txt", T)
> food.glm <- glm(y ~ Tenancy + SupInc + log(Inc+1), binomial, food)
> summary(food.glm)
....
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9264 1.6229 0.571 0.56813
Tenancy -1.8502 0.5347 -3.460 0.00054
SupInc 0.8961 0.5009 1.789 0.07365
log(Inc + 1) -0.3328 0.2729 -1.219 0.22280
The only significant coefficient seems to be that of the variable tenancy. A look at some
diagnostic plots is quite useful (see Figure 28).
> glm.diag.plots(food.glm)
It is clear that there is an observation with totally anomalous covariate values, which has
a very strong effect on the fit. In fact, observation 5 is the only one with a zero value for
the monthly income.
> food$Inc
[1] 271 287 714 521 0 518 458 1266 350 168 235 450 683 519
...
Any kind of robust method suitable for this data set must clearly bound the influence
of the design points. We get both the weights based on the hat matrix (used in their
examples by Cantoni and Ronchetti, 2001), and those based on the robust estimation of
location and scale. Following the suggestions in the code by Croux and Haesbroeck, in
the latter case we compute the weights using only the continuous covariate.
> X.food <- model.matrix(food.glm)
> w.hat.food <- sqrt(1 - hat(X.food))
> hp.food <- floor(nrow(X.food) * 0.75) + 1
> mcdx.food <- cov.mcd(as.matrix(X.food[,-(1:3)]), quan = hp.food, method = "mcd")
> rdx.food <- sqrt(mahalanobis(as.matrix(X.food[,-(1:3)]),
center = mcdx.food$center, cov = mcdx.food$cov))
> vc.food <- sqrt(qchisq(0.975, 1))
> w.rob.food <- as.numeric(rdx.food <= vc.food)
The two sets of weights present some differences, and the hat matrix-based ones provide
a smaller degree of weighting
48
Quantiles of standard normal
2
2
1
1
Residuals
0
0
!1
!1
!2
!2
!3 !2 !1 0 1 !2 !1 0 1 2
1.5
Cook statistic
Cook statistic
1.0
1.0
0.5
0.5
0.0
0.0
h/(1!h) Case
Figure 28: Foodstamp data: diagnostic plots for maximum likelihood estimates
> mean(w.rob.food)
[1] 0.94
> mean(w.hat.food)
[1] 0.9864798
However, both types of weight reach their minimum value at the observation 5. We now
compute the robust estimates β̂M , including a Huber-type version without any prior x-
weights. We start the algorithm from the MLE and, following Cantoni and Ronchetti
(2001), we set k = 1.2.
49
The standard errors are as follows.
> tab.se <- cbind(sqrt(diag(vcov(food.glm))), food.hub$se, food.mal$se,
food.mal.wrd$se)
> colnames(tab.se) <- c("MLE", "HUB", "MAL-HAT", "MAL-ROB")
> print(tab.se, digits = 3)
MLE HUB MAL-HAT MAL-ROB
(Intercept) 1.623 1.636 3.039 3.328
Tenancy 0.535 0.527 0.581 0.588
SupInc 0.501 0.516 0.553 0.561
log(Inc + 1) 0.273 0.276 0.519 0.572
We notice that the Mallows estimates are quite different from both the MLE and the
Huber estimates. The estimated weights for the residual ψc (ri )/ri , where ri is the Pearson
residual, provide a explanation for this fact.
> wei.food <- cbind(food.hub$weights, food.mal$weights, food.mal.wrob$weights)
> cond <- apply(wei.food, 1, "<", 1)
> cond <- apply(cond, 2, sum)
> wei.food[(cond>0),]
[,1] [,2] [,3]
5 0.8412010 0.04237215 0.02127919
22 0.4953700 0.60685048 0.65761723
25 1.0000000 0.97042523 0.93395722
26 0.8020679 1.00000000 1.00000000
40 0.6464152 0.41587812 0.36433034
51 1.0000000 0.96380099 0.92400623
52 0.8144845 1.00000000 1.00000000
59 1.0000000 0.97674194 0.94117456
66 0.2543750 0.13502255 0.11816794
79 0.6857541 0.54321273 0.50018522
94 0.7980593 1.00000000 1.00000000
95 0.6679639 0.48234377 0.43440327
103 0.4653931 0.45762357 0.47048220
107 0.8014854 1.00000000 1.00000000
109 0.9518519 0.52815274 0.45262133
120 0.4756079 0.50482509 0.52859808
137 0.2884602 0.23841016 0.23198605
141 0.7969428 1.00000000 1.00000000
147 0.3144637 0.35221333 0.36859260
150 0.7920675 1.00000000 1.00000000
The weights based on the Huber-type estimates are quite different from the Mallows-
type ones. In particular, Huber-type regression does not downweight enough observation
5. Things are different if we choose another starting point for the algorithm. A sensible
choice could be to start from a weighted MLE, obtained by selecting only the observations
for which the robust distance weights W (RDi ) are equal to 1.
> food.glm.wml <- glm(y ~ Tenancy + SupInc + log(Inc+1), binomial, food,
50
subset = (w.rob.food==1))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.6408 2.7665 2.039 0.041452
Tenancy -1.7749 0.5352 -3.317 0.000911
SupInc 0.6491 0.5181 1.253 0.210238
log(Inc + 1) -1.1123 0.4685 -2.374 0.017589
If we recompute the algorithm form the weighted MLE, the Huber-type estimates are
close to the Mallows one, and the weight given to observation 5 is now much smaller.
51
....
$coef
(Intercept) subx01 subx02 subx03
[1,] 5.824949 -1.832782 0.6703187 -1.148582
$sterror
[1] 3.3923786 0.5822091 0.5220386 0.5906120
In order to compare all the various estimates, we follow the approach of Kordzakhia
et al. (2001). They proposed to compare the various estimates using a goodness-of-fit
discrepancy, the chi-square statistic based on the arcsin transformation
n
% & √ ' (
2
Xarc = 4 arcsin yi − arcsin π̂i ,
i=1
√
where π̂i are the fitted probabilities. The values of the statistic for the various estimates
show again the importance of using x-weights for this data set.
Finally, the S-PLUS code of Cantoni (2004) gives the following results for the Mallows
estimator using the x-weights (1 − hii )1/2 (here we used our own port of the code)
52
The coefficients and standard erros are essentially the same as those obtained with our
function and stored in food.mal. The code by Cantoni (2004), however, is more reliable
and has a broad range of functions, including some functions for testing based on quasi
deviances (Cantoni and Ronchetti, 2004).
53
3.5
3.0
y=0
y=1
2.5
Rate
2.0
1.5 4
18
1.0
0.5
Volume
There is a dramatic increase in both coefficient values and standard errors. Actually,
without the two observations we are in situation of quasi-complete separation (Albert
and Anderson, 1984), with little overlap between observations with yi = 0 and yi = 1.
The model is nearly undeterminated. This is readily confirmed by Mallows (or Huber)
estimation, which assigns low weight to both observations 4 and 18, and provides results
similar to those obtained with MLE after removing the influential points.
> X.vaso <- model.matrix(vaso.glm)
> vaso.mal <- logit.BI(vaso.glm$coef, X.vaso, vaso$y, 1.2, sqrt(1 - hat(X.vaso)))
> cbind(vaso.mal$coef, vaso.mal$se)
[,1] [,2]
(Intercept) -22.01822 22.83438
lVol 36.10633 40.69736
lRate 28.72225 30.03298
> vaso.mal$weights[c(4,18)]
4 18
3.726611e-05 1.544562e-04
The same happens with the both versions of the Bianco and Yohai estimator. Once again,
the near-indeterminacy is reflected by large increases of coefficients and standard errors.
> vaso.WBY<- WBYlogreg(X.vaso[,-1], vaso$y)
> vaso.WBY
54
Quantiles of standard normal
2
2
Residuals
1
1
0
0
!1
!1
!6 !2 0 2 4 !2 !1 0 1 2
0.4
0.3
0.3
Cook statistic
Cook statistic
0.2
0.2
0.1
0.1
0.0
0.0
0.00 0.10 0.20 0.30 0 10 20 30 40
h/(1!h) Case
Figure 30: Vaso-constriction data: diagnostic plots for maximum likelihood estimates
....
$coef
(Intercept) subx01 subx02
[1,] -6.859868 10.74855 9.3733
$sterror
[1] 10.07252 15.34863 12.80866
Notice, however, that alternative methods may give different results, like in the case of
the OBRE (see Künsch et al., 1989).
References
Agostinelli, C. (2001), Wle: A package for robust statistics using weighted likelihood. R
News, 1/3, 32–38.
Agostinelli, C., Markatou, M. (1998), A one-step robust estimator for regression based
on the weighted likelihood reweighting scheme, Statistics & Probability Letters, 37,
341–350.
55
Becker, R.A., Chambers, J.M., Wilks, A.R. (1988), The New S Language, Wadsworth
and Brooks/Cole, Pacific Grove.
Bianco, A.M., Yohai, V.J. (1996), Robust estimation in the logistic regression model.
In: Rieder, H. (Ed.), Robust Statistics, Data Analysis, and Computer Intensive
Methods, Springer, pp. 17–34.
Cantoni, E. (2004), Analysis of robust quasi-deviance for generalized linear models, Jour-
nal of Statistical Software, 10, 1–9.
Chushny, A.R., Peebles, A.R. (1905), The action of optical isomers. II. Hyoscines, J.
Physil., 32, 501–510.
Croux, C., Haesbroeck, G. (2003), Implementing the Bianco and Yohai estimator for
logistic regression, Computational Statistics and Data Analysis, 44, 273–295.
Finney, D. J. (1947), The estimation from individual records of the relationship between
dose and quantal response, Biometrika, 34, 320–334.
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A. (1986), Robust Statistics:
The Approach Based on Influence Functions, Wiley.
Hawkins, D.M., Bradu, D., Kass, G.V. (1984), Location of several outliers in multiple
regression data using elemental sets, Technometrics, 26, 197–208.
Jørgensen, B. (1984), The delta algorithm and GLIM, International Statistical Review,
52, 283–300.
Kordzakhia, N., Mishra, G.D., Reiersølmoen, L. (2001), Robust estimation in the logistic
model, Journal of Statistical Planning and Inference, 98, 211–223.
Li, G. (1985), Robust regression, In Exploring Data Tables, Trends, and Shapes, eds.
Hoagling and Tukey, pp. 281–343, Wiley.
Marazzi, A. (1993), Algorithms, Routines, and S Functions for Robust Statistics, Wadsworth
and Brooks/Cole, Pacific Grove.
Markatou, M., Basu, A., Lindsay, B.G. (1998), Weighted likelihood equations with boot-
strap root search, Journal of the American Statistical Association, 93, 740–750.
56
McKean, J.W., Sheather, S.J., Hettmansperger, T.P. (1993), The use and interpretation
of residuals based on robust estimation, J. Amer. Statist. Ass., 88, 1254–1263.
Rousseeuw, P.J., Leroy, A.M. (1987), Robust regression and outliers detection, Wiley.
Staudte, R.G., Sheather, S.J. (1990), Robust Estimation and Testing, Wiley.
Street, J.O., Carroll, R.J., Ruppert, D. (1988), A note on computing robust regression
estimates via iteratively reweighted least squares. American Statistician, 42, 152–
154.
Venables, W. N., Ripley, B. D. (2002), Modern Applied Statistics with S. Fourth edition.
Springer.
57