0% found this document useful (0 votes)
54 views12 pages

An Introduction To Robust Estimation With R Functi Removed

This document discusses numerical methods for calculating maximum likelihood estimates in logistic regression models. It describes using numerical differentiation via the num.deriv function to update beta coefficients. It then summarizes the usage of the logit.BI function for fitting logistic regression models with iteratively reweighted least squares. The document also introduces two robust estimators - the Bianco and Yohai estimator and a weighted version. It applies these methods to a food stamp data example.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views12 pages

An Introduction To Robust Estimation With R Functi Removed

This document discusses numerical methods for calculating maximum likelihood estimates in logistic regression models. It describes using numerical differentiation via the num.deriv function to update beta coefficients. It then summarizes the usage of the logit.BI function for fitting logistic regression models with iteratively reweighted least squares. The document also introduces two robust estimators - the Bianco and Yohai estimator and a weighted version. It applies these methods to a food stamp data example.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

by numerical differentiation. In our function logit.

BI we use the simple numerical


routine num.deriv from the sn library. Hence, the updating of the beta coefficients
is essentially performed by the commands

> g.old <- g.fun(beta.old, X, y, offset, w.x, k1)


> J.old <- num.deriv(beta.old, "g.fun", X = X, y = y,
offset = offset, w.x = w.x, k1 = k1)
> beta.new <- beta.old - qr.solve(J.old, g.old)

The logit.BI function has the following usage

logit.BI(beta.in, X, y, k1, offset, w.x, maxiter, mytol)

and the various arguments are:

• beta.in: initial value for β;

• X, y: design matrix, response vector;

• k1: tuning constant, default is 1.2.

• offset: offset (default is a vector of 0s);

• w.x: x-based prior weights for Mallows estimation (default is a vector of 1s);

• maxiter, mytol: maximum number of iterations and tolerance for the algorithm.

The function returns a list with several components, including:

• coef: parameter estimates;

• se, V: estimated standard errors and asymptotic variance matrix for the regression
coefficients;

• weights: vector of weights on the residuals.

4.2 The Bianco and Yohai estimator


An alternative method is given by the Bianco and Yohai estimator (Bianco and Yohai,
1996), defined as
%n
) *
β̂BY = argmin ρk (d(xTi β; yi)) + C(xTi β) (9)
β i=1

where d(u; y) = −y log F (u) − (1 − y) log{(1 − u)}, ρk is a bounded function and C(xTi β)
is a bias correction term. Bianco and Yohai (1996) proposed the following ρ function,
 2
x − x

if x ≤ k
ρk (x) = k 2k

 otherwise
2

46
but stressed that other choices are possible. Croux and Haesbroeck (2003) extend the
Bianco and Yohai estimator by including weights for downweighting high-leverage points,
thus defining a bounded-influence estimator
n
% ) *
β̂W BY = argmin w(xi) ρk (d(xTi β; yi)) + C(xTi β) . (10)
β i=1

They suggested a decreasing function of robust Mahalanobis distances for the weights
w(xi ), where the distances are computed using the Minimum Covariance Determinant
(MCD) estimator (see Rousseeuw and Leroy, 1987). More precisely, w(xi ) are obtained as
follows. The MCD method seeks h points whose covariance has minimum determinant,
and Croux and Haesbroeck (2003) suggest h = 3/4 n, giving a 25% breakdown point
estimator. The method is implemented by the function cov.rob (or cov.mcd, which is a
convenient wrapper) in the MASS library. If X is the design matrix, the robust estimate of
multivariate location and scale are obtained by

> hp <- floor(nrow(X) * 0.75) + 1


> mcdx <- cov.rob(X, quan = hp, method = "mcd")

Once the robust estimate of location and scale have been computed, we can obtain the
robust distances RDi . Finally the weights are defined as w(xi ) = W (RDi ), where W is
the weight function W (t) = I{t2 ≤χ2p,0.975 } , with IA denoting the indicator function of the
set A.

> rdx <- sqrt(mahalanobis(X, center = mcdx$center, cov = mcdx$cov))


> vc <- sqrt(qchisq(0.975, ncol(X)))
> wx <- as.numeric(rdx <= vc)

Notice that the same weights could also be used in (8). Croux and Haesbroeck (2003) have
implemented both the Bianco and Yohai estimator and their weighted version in some
public-domain S-PLUS functions, which can be also used in R with a few changes. The
two functions are BYlogreg and WBYlogreg, respectively, which we slightly modified in
order to deal with dummy covariates in the design matrix. The arguments of the function
BYlogreg are

• x0, y: design matrix (not including the intercept), response vector;

• x0cont: design matrix to use for computing the weighted MLE (if initwml=T)

• initwml: logical value for selecting one of the two possible methods for computing
the initial value: if initwml=T a weighted MLE, otherwise the classical MLE.

• const: tuning constant, default is 0.5.

• kmax, maxhalf: maximum number of iterations and max number of step-halving.


the algorithm.

The functions returns a list, including the components coef and sterror for parameter
estimates and standard errors. The function WBYlogreg has exactly the same arguments,
with the exception of initwml as the weighted MLE is always used as starting point.

47
Example: Food Stamp Data
This is a classical example of robust statistics, see for example Künsch et al. (1989). The
food stamp data set, of sample size 150, consists of a binary response variable participa-
tion in the US Food Stamp Program. The covariates included in the model are tenancy
(Tenancy), supplemental income (SupInc), and log(monthly income + 1) (log(Inc+1)).
The data are contained in the file foodstamp.txt. Let us start the analysis by getting
the MLE.
> food <- read.table("foodstamp.txt", T)
> food.glm <- glm(y ~ Tenancy + SupInc + log(Inc+1), binomial, food)
> summary(food.glm)
....
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.9264 1.6229 0.571 0.56813
Tenancy -1.8502 0.5347 -3.460 0.00054
SupInc 0.8961 0.5009 1.789 0.07365
log(Inc + 1) -0.3328 0.2729 -1.219 0.22280

The only significant coefficient seems to be that of the variable tenancy. A look at some
diagnostic plots is quite useful (see Figure 28).
> glm.diag.plots(food.glm)
It is clear that there is an observation with totally anomalous covariate values, which has
a very strong effect on the fit. In fact, observation 5 is the only one with a zero value for
the monthly income.
> food$Inc
[1] 271 287 714 521 0 518 458 1266 350 168 235 450 683 519
...
Any kind of robust method suitable for this data set must clearly bound the influence
of the design points. We get both the weights based on the hat matrix (used in their
examples by Cantoni and Ronchetti, 2001), and those based on the robust estimation of
location and scale. Following the suggestions in the code by Croux and Haesbroeck, in
the latter case we compute the weights using only the continuous covariate.
> X.food <- model.matrix(food.glm)
> w.hat.food <- sqrt(1 - hat(X.food))
> hp.food <- floor(nrow(X.food) * 0.75) + 1
> mcdx.food <- cov.mcd(as.matrix(X.food[,-(1:3)]), quan = hp.food, method = "mcd")
> rdx.food <- sqrt(mahalanobis(as.matrix(X.food[,-(1:3)]),
center = mcdx.food$center, cov = mcdx.food$cov))
> vc.food <- sqrt(qchisq(0.975, 1))
> w.rob.food <- as.numeric(rdx.food <= vc.food)
The two sets of weights present some differences, and the hat matrix-based ones provide
a smaller degree of weighting

48
Quantiles of standard normal

2
2

1
1
Residuals

0
0

!1
!1

!2
!2

!3 !2 !1 0 1 !2 !1 0 1 2

Linear predictor Ordered deviance residuals


1.5

1.5
Cook statistic

Cook statistic
1.0

1.0
0.5

0.5
0.0

0.0

0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150

h/(1!h) Case

Figure 28: Foodstamp data: diagnostic plots for maximum likelihood estimates

> mean(w.rob.food)
[1] 0.94
> mean(w.hat.food)
[1] 0.9864798

However, both types of weight reach their minimum value at the observation 5. We now
compute the robust estimates β̂M , including a Huber-type version without any prior x-
weights. We start the algorithm from the MLE and, following Cantoni and Ronchetti
(2001), we set k = 1.2.

> food.hub <- logit.BI(food.glm$coef, X.food, food$y, 1.2,)


> food.mal <- logit.BI(food.glm$coef, X.food, food$y, 1.2, w.x = w.hat.food)
> food.mal.wrd <- logit.BI(food.glm$coef, X.food, food$y, 1.2, w.x = w.rob.food)
> tab.coef <- (cbind(food.glm$coef, food.hub$coef, food.mal$coef, food.mal.wrd$coef))
> colnames(tab.coef)<- c("MLE", "HUB", "MAL-HAT", "MAL-ROB")
> print(tab.coef, digits = 3)
MLE HUB MAL-HAT MAL-ROB
(Intercept) 0.926 0.710 6.687 8.065
Tenancy -1.850 -1.778 -1.855 -1.784
SupInc 0.896 0.802 0.606 0.586
log(Inc + 1) -0.333 -0.287 -1.298 -1.540

49
The standard errors are as follows.
> tab.se <- cbind(sqrt(diag(vcov(food.glm))), food.hub$se, food.mal$se,
food.mal.wrd$se)
> colnames(tab.se) <- c("MLE", "HUB", "MAL-HAT", "MAL-ROB")
> print(tab.se, digits = 3)
MLE HUB MAL-HAT MAL-ROB
(Intercept) 1.623 1.636 3.039 3.328
Tenancy 0.535 0.527 0.581 0.588
SupInc 0.501 0.516 0.553 0.561
log(Inc + 1) 0.273 0.276 0.519 0.572
We notice that the Mallows estimates are quite different from both the MLE and the
Huber estimates. The estimated weights for the residual ψc (ri )/ri , where ri is the Pearson
residual, provide a explanation for this fact.
> wei.food <- cbind(food.hub$weights, food.mal$weights, food.mal.wrob$weights)
> cond <- apply(wei.food, 1, "<", 1)
> cond <- apply(cond, 2, sum)
> wei.food[(cond>0),]
[,1] [,2] [,3]
5 0.8412010 0.04237215 0.02127919
22 0.4953700 0.60685048 0.65761723
25 1.0000000 0.97042523 0.93395722
26 0.8020679 1.00000000 1.00000000
40 0.6464152 0.41587812 0.36433034
51 1.0000000 0.96380099 0.92400623
52 0.8144845 1.00000000 1.00000000
59 1.0000000 0.97674194 0.94117456
66 0.2543750 0.13502255 0.11816794
79 0.6857541 0.54321273 0.50018522
94 0.7980593 1.00000000 1.00000000
95 0.6679639 0.48234377 0.43440327
103 0.4653931 0.45762357 0.47048220
107 0.8014854 1.00000000 1.00000000
109 0.9518519 0.52815274 0.45262133
120 0.4756079 0.50482509 0.52859808
137 0.2884602 0.23841016 0.23198605
141 0.7969428 1.00000000 1.00000000
147 0.3144637 0.35221333 0.36859260
150 0.7920675 1.00000000 1.00000000
The weights based on the Huber-type estimates are quite different from the Mallows-
type ones. In particular, Huber-type regression does not downweight enough observation
5. Things are different if we choose another starting point for the algorithm. A sensible
choice could be to start from a weighted MLE, obtained by selecting only the observations
for which the robust distance weights W (RDi ) are equal to 1.
> food.glm.wml <- glm(y ~ Tenancy + SupInc + log(Inc+1), binomial, food,

50
subset = (w.rob.food==1))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.6408 2.7665 2.039 0.041452
Tenancy -1.7749 0.5352 -3.317 0.000911
SupInc 0.6491 0.5181 1.253 0.210238
log(Inc + 1) -1.1123 0.4685 -2.374 0.017589

If we recompute the algorithm form the weighted MLE, the Huber-type estimates are
close to the Mallows one, and the weight given to observation 5 is now much smaller.

> food.hub.wml <- logit.BI(food.glm.wml$coef, X.food, food$y, 1.2)


> food.hub.wml$coef
(Intercept) Tenancy SupInc log(Inc + 1)
6.531852 -1.847610 0.607605 -1.270387
> food.hub.wml$weights[5]
5
0.04579391
A similar result is obtained with the Bianco and Yohai estimator. If we call the BYlogreg
with the argument initwml = F, the MLE is used as starting point.
> food.BY<- BYlogreg(X.food[,-1], food$y, initwml = F)
> food.BY
....
$coef
(Intercept) x0Tenancy x0SupInc x0log(Inc + 1)
[1,] 0.8814444 -1.768291 0.8456321 -0.3218968
$sterror
Tenancy SupInc log(Inc + 1)
5.7102449 0.5785001 0.6279328 0.9294853
The results are not much different from the MLE. A better option is to use the weighted
MLE as starting point. This requires as a further argument the matrix used to compute
the weights.

> food.BY.wml <- BYlogreg(X.food[,-1], food$y, as.matrix(X.food[,-(1:3)]))


> food.BY.wml
....
$coef
(Intercept) x0Tenancy x0SupInc x0log(Inc + 1)
[1,] 5.369783 -1.691211 0.6173553 -1.070233
$sterror
Tenancy SupInc log(Inc + 1)
7.0927687 0.5391049 0.5240751 1.2190520
The results are similar to the weighted version of the Bianco and Yohai estimator.

> food.WBY <- WBYlogreg(X.food[,-1], food$y, as.matrix(X.food[,-(1:3)]) )


>food.WBY

51
....
$coef
(Intercept) subx01 subx02 subx03
[1,] 5.824949 -1.832782 0.6703187 -1.148582
$sterror
[1] 3.3923786 0.5822091 0.5220386 0.5906120

In order to compare all the various estimates, we follow the approach of Kordzakhia
et al. (2001). They proposed to compare the various estimates using a goodness-of-fit
discrepancy, the chi-square statistic based on the arcsin transformation
n
% & √ ' (
2
Xarc = 4 arcsin yi − arcsin π̂i ,
i=1

where π̂i are the fitted probabilities. The values of the statistic for the various estimates
show again the importance of using x-weights for this data set.

> X2.arc <- function(y, mu) 4 * sum((asin(sqrt(y)) - asin(sqrt(mu)))^2)


> X2.arc(food$y, plogis(X.food %*% food.glm$coef))
[1] 173.5109
> X2.arc(food$y, plogis(X.food %*% food.glm.wml$coef))
[1] 172.3801
> X2.arc(food$y, plogis(X.food %*% food.hub$coef))
[1] 175.4812
> X2.arc(food$y, plogis(X.food %*% food.hub.wml$coef))
[1] 170.2295
> X2.arc(food$y, plogis(X.food %*% food.mal$coef))
[1] 170.0075
> X2.arc(food$y, plogis(X.food %*% food.mal.wrob$coef))
[1] 169.8280
> X2.arc(food$y, plogis(X.food %*% t(food.BY$coef)))
[1] 174.8348
> X2.arc(food$y, plogis(X.food %*% t(food.BY.wml$coef)))
[1] 173.0532
> X2.arc(food$y, plogis(X.food %*% t(food.WBY$coef)))
[1] 171.0866

Finally, the S-PLUS code of Cantoni (2004) gives the following results for the Mallows
estimator using the x-weights (1 − hii )1/2 (here we used our own port of the code)

> food.can <- glm.rob(X.food[,-1], food$y, chuber = 1.2,


weights.on.x = T, ni = rep(1,nrow(X.food)))
> food.can$coef
[1] 6.6870043 -1.8551298 0.6061823 -1.2975844
> food.can$sd
Tenancy SupInc log(Inc + 1)
3.0756946 0.5946090 0.5592972 0.5264943

52
The coefficients and standard erros are essentially the same as those obtained with our
function and stored in food.mal. The code by Cantoni (2004), however, is more reliable
and has a broad range of functions, including some functions for testing based on quasi
deviances (Cantoni and Ronchetti, 2004).

Example: Vaso-constriction Data


We consider an example from Finney (1947), already analysed in Künsch et al. (1989).
These data consist of 39 observations on three variables: the occurence of vaso-constriction
in the skin of the digits, and the rate and volume of air inspired. The model considered
by Künsch et al. (1989) regresses the occurrence of vaso-constriction on the logarithm of
air rate and volume. The data are in the file vaso.txt.
vaso <- read.table("vaso.txt", T)
vaso$lVol <- log(vaso$Vol)
vaso$lRate <- log(vaso$Rate)
vaso$Resp <- 1 - (as.numeric(vaso$Resp) - 1)
vaso$y <- vaso$Resp
A plot of the data show some difference in the covariates for the two groups of patients
with different response values.
> plot(vaso$Vol,vaso$Rate,type = "n", xlab = "Volume", ylab = "Rate")
> points(vaso$Vol[vaso$y==0], vaso$Rate[vaso$y==0], col = 1, pch = 16)
> points(vaso$Vol[vaso$y==1], vaso$Rate[vaso$y==1], col = 2, pch = 16)
> legend(2.5, 3.0, c("y=0 ", "y=1 "), fill = c(1, 2), text.col = c("black", "red"))
Standard diagnostic plots based on the maximum likelihood fit show that there two quite
influential observations (4 and 18):
> vaso.glm <- glm(Resp ~ lVol + lRate,family = binomial, data = vaso)
> summary(vaso.glm)
....
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.924 1.288 -2.270 0.02318
lVol 5.220 1.858 2.810 0.00496
lRate 4.631 1.789 2.589 0.00964
> glm.diag.plots(vaso.glm)
If we re-estimate the model after removing these two observations, we can observe that
their effect on the model estimate is huge.
> vaso.glm.w418 <- update(vaso.glm, data = vaso[-c(4,18),])
> summary(vaso.glm.w418)
....
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -24.58 14.02 -1.753 0.0796
lVol 39.55 23.25 1.701 0.0889

53
3.5
3.0
y=0
y=1

2.5
Rate

2.0
1.5 4
18
1.0
0.5

0.5 1.0 1.5 2.0 2.5 3.0 3.5

Volume

Figure 29: Vaso-constriction data: covariate scatterplot

lRate 31.94 17.76 1.798 0.0721

There is a dramatic increase in both coefficient values and standard errors. Actually,
without the two observations we are in situation of quasi-complete separation (Albert
and Anderson, 1984), with little overlap between observations with yi = 0 and yi = 1.
The model is nearly undeterminated. This is readily confirmed by Mallows (or Huber)
estimation, which assigns low weight to both observations 4 and 18, and provides results
similar to those obtained with MLE after removing the influential points.
> X.vaso <- model.matrix(vaso.glm)
> vaso.mal <- logit.BI(vaso.glm$coef, X.vaso, vaso$y, 1.2, sqrt(1 - hat(X.vaso)))
> cbind(vaso.mal$coef, vaso.mal$se)
[,1] [,2]
(Intercept) -22.01822 22.83438
lVol 36.10633 40.69736
lRate 28.72225 30.03298
> vaso.mal$weights[c(4,18)]
4 18
3.726611e-05 1.544562e-04
The same happens with the both versions of the Bianco and Yohai estimator. Once again,
the near-indeterminacy is reflected by large increases of coefficients and standard errors.
> vaso.WBY<- WBYlogreg(X.vaso[,-1], vaso$y)
> vaso.WBY

54
Quantiles of standard normal

2
2
Residuals

1
1

0
0

!1
!1

!6 !2 0 2 4 !2 !1 0 1 2

Linear predictor Ordered deviance residuals


0.4

0.4
0.3

0.3
Cook statistic

Cook statistic
0.2

0.2
0.1

0.1
0.0

0.0
0.00 0.10 0.20 0.30 0 10 20 30 40

h/(1!h) Case

Figure 30: Vaso-constriction data: diagnostic plots for maximum likelihood estimates

....
$coef
(Intercept) subx01 subx02
[1,] -6.859868 10.74855 9.3733
$sterror
[1] 10.07252 15.34863 12.80866
Notice, however, that alternative methods may give different results, like in the case of
the OBRE (see Künsch et al., 1989).

References
Agostinelli, C. (2001), Wle: A package for robust statistics using weighted likelihood. R
News, 1/3, 32–38.

Agostinelli, C., Markatou, M. (1998), A one-step robust estimator for regression based
on the weighted likelihood reweighting scheme, Statistics & Probability Letters, 37,
341–350.

Albert, A., Anderson, J. A. (1984), On the existence of maximum likelihood estimates


in logistic regression models, Biometrika, 71 , 1–10.

55
Becker, R.A., Chambers, J.M., Wilks, A.R. (1988), The New S Language, Wadsworth
and Brooks/Cole, Pacific Grove.

Belsley, D. A., Kuh, E., Welsch, R. E. (1980), Regression Diagnostics, Wiley.

Bianco, A.M., Yohai, V.J. (1996), Robust estimation in the logistic regression model.
In: Rieder, H. (Ed.), Robust Statistics, Data Analysis, and Computer Intensive
Methods, Springer, pp. 17–34.

Cantoni, E. (2004), Analysis of robust quasi-deviance for generalized linear models, Jour-
nal of Statistical Software, 10, 1–9.

Cantoni, E., Ronchetti, E. (2001). Efficient bounded-influence regression estimation.


Journal of the American Statistical Association, 96, 1022–1030.

Chushny, A.R., Peebles, A.R. (1905), The action of optical isomers. II. Hyoscines, J.
Physil., 32, 501–510.

Croux, C., Haesbroeck, G. (2003), Implementing the Bianco and Yohai estimator for
logistic regression, Computational Statistics and Data Analysis, 44, 273–295.

Finney, D. J. (1947), The estimation from individual records of the relationship between
dose and quantal response, Biometrika, 34, 320–334.

Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A. (1986), Robust Statistics:
The Approach Based on Influence Functions, Wiley.

Hawkins, D.M., Bradu, D., Kass, G.V. (1984), Location of several outliers in multiple
regression data using elemental sets, Technometrics, 26, 197–208.

Huber, P. J. (1981), Robust Statistics, Wiley.

Jørgensen, B. (1984), The delta algorithm and GLIM, International Statistical Review,
52, 283–300.

Kordzakhia, N., Mishra, G.D., Reiersølmoen, L. (2001), Robust estimation in the logistic
model, Journal of Statistical Planning and Inference, 98, 211–223.

Krasker, W. S., Welsch, R. E. (1982), Efficient bounded-influence regression estimation.


Journal of the American Statistical Association, 77, 595–604.

Künsch, H.R., Stefanski, L.A., Carroll, R. J. (1989). Conditionally unbiased bounded-


influence estimation in general regression models, with application to generalized
linear models. Journal of the American Statistical Association, 84, 460–466.

Li, G. (1985), Robust regression, In Exploring Data Tables, Trends, and Shapes, eds.
Hoagling and Tukey, pp. 281–343, Wiley.

Marazzi, A. (1993), Algorithms, Routines, and S Functions for Robust Statistics, Wadsworth
and Brooks/Cole, Pacific Grove.

Markatou, M., Basu, A., Lindsay, B.G. (1998), Weighted likelihood equations with boot-
strap root search, Journal of the American Statistical Association, 93, 740–750.

56
McKean, J.W., Sheather, S.J., Hettmansperger, T.P. (1993), The use and interpretation
of residuals based on robust estimation, J. Amer. Statist. Ass., 88, 1254–1263.

McNeil, D. R. (1977), Interactive Data Analysis, Wiley.

Rousseeuw, P.J., Leroy, A.M. (1987), Robust regression and outliers detection, Wiley.

Staudte, R.G., Sheather, S.J. (1990), Robust Estimation and Testing, Wiley.

Street, J.O., Carroll, R.J., Ruppert, D. (1988), A note on computing robust regression
estimates via iteratively reweighted least squares. American Statistician, 42, 152–
154.

Venables, W. N., Ripley, B. D. (2002), Modern Applied Statistics with S. Fourth edition.
Springer.

57

View publication stats

You might also like