MultilevelPRI Workshop
MultilevelPRI Workshop
March 2009
I. Multilevel Modeling
Multilevel models are statistical models used to analyze data that have a hierarchical or
nested structure, for example:
-
Thus, to address this issues, using a multilevel approach will be a more accurate model
than a ordinary least squares regression (OLS). The names received for this model in the
different fields or literature around this topic is: random coefficient model, variance
component model, and hierarchical linear model.
The basic two level model:
Level 1
Yij
0j + 1jXij + rij
- equation 1
00 + 01Zj + u0j
10
- equation 2
- equation 3
Level 2
0j
1j
=
=
Substitution of ( 2 ) and ( 3 ) in ( 1 )
Yij
- equation 4
Where
i
j
0j
00
Xij
10
rij and u0j
: subjects
: groups
: the mean of Yij for group j
: the grand mean of Yij
: the predictor variable for the subject i in the group j
: the fixed effect of the predictor variable Xij across groups
: random components normally distributed and independent of each other
We could add more predictors at level 1 and we could add random components at level 2.
However, the basic idea of multilevel modeling is to decompose the component variance
in two non-observables: one at a subject level and one at a group level.
To estimate this model, Maximum Likelihood estimators (MLE) are generally used.
MLEs require an iterative procedure until the model converges, allowing us to obtain
unbiased and efficient estimators.
Assumptions of OLS and Multilevel modeling
OLS estimation
Multilevel estimation
Linearity: Function
Linearity: at each level
Normality: Residuals
Normality: at each level
Homoscedasticity: Constant variance
Homoscedasticity: At level 1
Independence: between observations
Independence:
- No correlation between residuals at
level 1 and level 2.
- Observations in the highest level are
independent of each other.
Mean indicators for multilevel modeling
Proportion of variance:
At level 1: rij / (rij + u0j )
At level 2: u0j / (rij + u0j)
:
:
:
:
:
:
:
:
:
:
:
:
:
:
bserv
dtotal
:
:
Schools ID
Classroom code
Students code
Math achievement in 2004
Age (years)
1 Female 0 Male
Number of siblings
The student lives with both parents
Socioeconomic index
The student has as a mother tongue an indigenous language
The school is public
Full grade school
1 Urban 0 Rural
Schools that participate in the government-headed bilingual
program
Number of basic services at school
Days of class during the year
*Null Model
xtmixed
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:
Number of obs
Number of groups
=
=
10327
564
1
18.3
30
Wald chi2(0)
Prob > chi2
=
=
.
.
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
291.5814
2.171078
134.30
0.000
287.3262
295.8366
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
2422.165
160.494
2127.172
2758.067
-----------------------------+-----------------------------------------------var(Residual) |
2489.893
35.66315
2420.967
2560.782
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 4803.15 Prob >= chibar2 = 0.0000
Number of obs
Number of groups
=
=
10084
564
1
17.9
30
Wald chi2(6)
Prob > chi2
=
=
581.48
0.0000
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -8.197397
.5646148
-14.52
0.000
-9.304021
-7.090772
gender | -10.04743
1.053241
-9.54
0.000
-12.11175
-7.983119
nsibbl | -1.418298
.2599372
-5.46
0.000
-1.927766
-.9088307
nuclear | -4.110831
1.19185
-3.45
0.001
-6.446815
-1.774847
ses_i |
.4424885
.0532977
8.30
0.000
.338027
.54695
indig | -16.40169
2.547377
-6.44
0.000
-21.39446
-11.40892
_cons |
360.9887
9.142899
39.48
0.000
343.069
378.9085
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
1418.413
111.9487
1215.126
1655.709
-----------------------------+-----------------------------------------------var(Residual) |
2415.155
35.33061
2346.892
2485.404
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 1760.38 Prob >= chibar2 = 0.0000
Number of obs
Number of groups
=
=
10084
564
1
17.9
30
Wald chi2(12)
Prob > chi2
=
=
1170.77
0.0000
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -7.704743
.5607968
-13.74
0.000
-8.803884
-6.605601
gender | -10.23747
1.045682
-9.79
0.000
-12.28697
-8.187969
nsibbl | -1.104413
.2590295
-4.26
0.000
-1.612102
-.5967245
nuclear | -3.823928
1.185488
-3.23
0.001
-6.147442
-1.500415
ses_i |
.1828045
.0559183
3.27
0.001
.0732067
.2924024
indig | -13.38815
2.510379
-5.33
0.000
-18.3084
-8.467896
gestion | -39.19536
4.042729
-9.70
0.000
-47.11896
-31.27175
tipo |
11.72788
3.866162
3.03
0.002
4.150338
19.30541
area |
4.210408
4.328804
0.97
0.331
-4.273892
12.69471
bserv |
10.77618
1.946826
5.54
0.000
6.960472
14.59189
dtotal |
.404681
.1593815
2.54
0.011
.092299
.7170629
prog_ebi | -16.20831
4.441248
-3.65
0.000
-24.91299
-7.503619
_cons |
270.5271
44.33491
6.10
0.000
183.6323
357.4219
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
817.257
63.49015
701.8292
951.6688
-----------------------------+-----------------------------------------------var(Residual) |
2400.291
34.86962
2332.912
2469.617
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 1479.13 Prob >= chibar2 = 0.0000
*GLLAMM*
********
*1st - Identified the dependent variable: math
*2nd - Number of levels: 2
*3rd - Identified level two id: idschool
*4th - With or without random coefficients: Without
*5th - Identified predictors at level 1: age gender nsibbl nuclear ses_i
indig
*6th - Identified predictors at level 2: gestion tipo area bserv dtotal
prog_ebi
*Syntax
******************************a two level model
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*gllamm
[dependent variable] [independent variables], i([id
level variable]) eqs([equations]) nrf([number of random components])
[other options]
******************************a three level model
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*gllamm
[dependent variable] [independent variables], i([level
2 level 3]) eqs([equations]) nrf([number of random components]) [other
options]
******************************a two level model with random coefficients
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*eq slope
: [variable with random coefficient]
*gllamm
[dependent variable] [independent variables], i([id
level variable]) eqs([equations]) nrf([number of random components])
[other options]
*Examples*
**********
*Null model
gen
eq int
gllamm
cons=1
: cons
math, i(idschool) eqs(int) nrf(1) adapt
= -58198.98
= -56186.657
= -55826.109
= -55801.265
= -55801.26
10
= -55094.61
= -54598.877
= -54489.213
= -54489.213
11
=
=
=
=
=
-54787.036
-54495.168
-54276.704
-54069.796
-54069.796
12
13
Estimate
Std. Error
291.58139
2.171077
1
a Dependent Variable: MAth.
df
539.227
Sig.
134.303
Lower Bound
.000
Upper Bound
287.316585
295.846196
Estimate
2489.8932
07
2422.1624
26
Std. Error
Wald Z
Sig.
Lower Bound
Upper Bound
35.663150
69.817
.000
2420.966727
2560.782067
160.49373
8
15.092
.000
2127.170040
2758.063864
Nsibbl
Estimate
360.98869
8
-8.197397
10.047433
-1.418299
nuclear
-4.110831
Age
gender
Std. Error
df
Sig.
Lower Bound
Upper Bound
9.142899
9964.925
39.483
.000
343.066768
378.910628
.564615
9830.408
-14.519
.000
-9.304158
-7.090636
1.053241
9675.797
-9.540
.000
-12.112006
-7.982860
.259937
9791.842
-5.456
.000
-1.927829
-.908768
1.191850
9579.555
-3.449
.001
-6.447110
-1.774552
.442489
.053298
2.547377
16.401693
a Dependent Variable: MAth.
8310.954
8.302
.000
.338012
.546965
8871.112
-6.439
.000
-21.395141
-11.408245
ses_i
Indigo
Estimate
2415.1550
55
1418.4124
24
Std. Error
Wald Z
Sig.
Lower Bound
Upper Bound
35.330614
68.359
.000
2346.891618
2485.404054
111.94861
1
12.670
.000
1215.125679
1655.708409
14
Nsibbl
Estimate
270.52710
2
-7.704743
10.237467
-1.104413
nuclear
-3.823929
1.185488
9732.357
-3.226
.001
-6.147731
-1.500126
ses_i
.055918
10022.105
3.269
.001
.073193
.292416
2.510379
8566.070
-5.333
.000
-18.309099
-8.467202
4.042729
502.568
-9.695
.000
-47.138085
-31.252623
Tipo
.182805
13.388150
39.195354
11.727876
3.866161
506.687
3.033
.003
4.132195
19.323557
Area
4.210408
4.328804
489.480
.973
.331
-4.294922
12.715738
Bserv
10.776182
1.946826
547.469
5.535
.000
6.952018
14.600345
.404681
.159381
prog_ebi
4.441248
16.208304
a Dependent Variable: MAth.
514.081
2.539
.011
.091562
.717800
549.555
-3.649
.000
-24.932203
-7.484406
Age
gender
Indigo
gestion
Std. Error
df
Sig.
Lower Bound
Upper Bound
44.334909
547.503
6.102
.000
183.439762
357.614443
.560797
9941.506
-13.739
.000
-8.804018
-6.605467
1.045682
9863.263
-9.790
.000
-12.287217
-8.187717
.259030
9865.989
-4.264
.000
-1.612164
-.596662
Dtotal
Estimate
2400.2913
25
817.25668
9
Std. Error
Wald Z
Sig.
Lower Bound
Upper Bound
34.869622
68.836
.000
2332.911916
2469.616793
63.490120
12.872
.000
701.828974
951.668455
15
2. Select the variables that will be used for the analysis at each level
16
3. Save the template file (.mdmt) and the mdm file (.mdm) with names that you prefer.
4. Press Make MDM to create the data file with the information of both levels. In the
case of a error, a message will appear on the screen.
5. Press Check Stats and save the descriptive statistics of your dataset at each level
(mean, standard deviation, minimum and maximum).
6. Press Done and you will have the following window
7. Once you have obtained the window with the data at both levels, we could start
running the different multilevel models.
17
8. Running a null model: Do a click on Math and select the option Outcome variable
Thus, we could see that we are running a simple anova with random effects by school
18
Select Run Analysis from the menu and then under option, choose Run the model
shown
Then go to File from your menu options and select View Output and a txt file will
appear with the regression analysis developed by the program
Program:
Authors:
Publisher:
Level 1
Level 2
Precision
Weighting?
no
no
no
Weight
Variable
Name
Normalized?
MATH
Level-2
Predictors
---------------
19
INTRCPT1, B0
INTRCPT2, G00
2489.89329
2422.15990
MATH
MATH
20
INTRCPT1,
U0
49.21544
2422.15990
563
9762.13458
0.000
level-1,
R
49.89883
2489.89329
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 111599.013039
Number of estimated parameters = 2
21
We add the level one variable by clicking on each variable and selecting the option add
variable uncentered. There are other options like: group mean centered and grand mean
centered. However, for this exercise we will run all the variables uncentered.
Program:
Authors:
Publisher:
Level 1
Weighting?
no
Weight
Variable
Name
Normalized?
22
Level 2
Precision
no
no
MATH
Level-2
Predictors
--------------INTRCPT2, G00
INTRCPT2, G10
INTRCPT2, G20
INTRCPT2, G30
INTRCPT2, G40
INTRCPT2, G50
INTRCPT2, G60
'#' - The residual parameter variance for this level-1 coefficient has been set
to zero.
The model specified for the covariance components was:
--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + B1*(AGE) + B2*(GENDER) + B3*(NSIBBL) + B4*(NUCLEAR) + B5*(SES_I) +
B6*(INDIG) + R
Level-2 Model
B0 = G00 + U0
B1 = G10
B2 = G20
B3 = G30
B4 = G40
B5 = G50
B6 = G60
Run-time deletion has reduced the number of level-1 records to 10084
Iterations stopped due to small change in likelihood function
******* ITERATION 6 *******
Sigma_squared =
Tau
INTRCPT1,B0
2415.14524
1418.53123
23
MATH
MATH
24
10. To run the model 2 School variables, we need to click on the level 2 button and
select where we want to add the level two variables: intercept or coefficients; in other
words if we want to add predictors for the intercept or cross-level interactions. In this
case for simplicity we only add predictors to the intercept.
Program:
Authors:
Publisher:
Level 1
Weighting?
no
Weight
Variable
Name
Normalized?
25
Level 2
Precision
no
no
MATH
#
#
#
#
#
#
AGE
GENDER
NSIBBL
NUCLEAR
SES_I
INDIG
slope,
slope,
slope,
slope,
slope,
slope,
B1
B2
B3
B4
B5
B6
Level-2
Predictors
--------------INTRCPT2, G00
GESTION, G01
TIPO, G02
AREA, G03
PROG_EBI, G04
BSERV, G05
DTOTAL, G06
INTRCPT2, G10
INTRCPT2, G20
INTRCPT2, G30
INTRCPT2, G40
INTRCPT2, G50
INTRCPT2, G60
'#' - The residual parameter variance for this level-1 coefficient has been set
to zero.
The model specified for the covariance components was:
--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + B1*(AGE) + B2*(GENDER) + B3*(NSIBBL) + B4*(NUCLEAR) + B5*(SES_I) +
B6*(INDIG) + R
Level-2 Model
B0 = G00 + G01*(GESTION) + G02*(TIPO) + G03*(AREA) + G04*(PROG_EBI)
+ G05*(BSERV) + G06*(DTOTAL) + U0
B1 = G10
B2 = G20
B3 = G30
B4 = G40
B5 = G50
B6 = G60
Run-time deletion has reduced the number of level-1 records to 10084
Iterations stopped due to small change in likelihood function
******* ITERATION 6 *******
Sigma_squared =
Tau
INTRCPT1,B0
2400.29539
817.21163
26
INTRCPT1, B0
0.799
---------------------------------------------------The value of the likelihood function at iteration 6 = -5.404159E+004
The outcome variable is
MATH
MATH
27
level-1,
R
48.99281
2400.29539
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 108083.173301
Number of estimated parameters = 2
11. If we want to have random coefficients, we need to click next to the coefficient for
each variable at Level 2 Model, and then add a random component for this variable
across schools.
In this case, we are adding a random component for the coefficient of age; this means that
this coefficient varies randomly across schools.
28
V. Multilevel in SAS
The command to run a multilevel model in SAS is PROC MIXED and the syntax is:
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / SUBJECT=[level 2 id]
RUN;
COVTEST
METHOD
CLASS
MODEL
SOLUTION
RANDOM
SUBJECT
RUN
29
RUN;
*Example*
*********
*Model Null
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
*Model 1 Student variables
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = age gender nsibbl nuclear ses_i indig / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
*Model 2 School variables
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = age gender nsibbl nuclear ses_i indig gestion tipo area bserv dtotal
prog_ebi / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
30
VI. Summary
In sum, the three programs give us the same results in terms of significance, coefficients and standard errors estimated, as we could
see in the following table:
Table 1. Math Achievement in Math for Sixth Grade in Peru (2004)
Age (years)
Female
Number of siblings
Nuclear family
Socioeconomic status
Indigenous
Intercept
Level 1 variance
Level 2 variance
GLLAMM
se ( )
-8.20
0.57
-10.05
1.05
-1.42
0.26
-4.11
1.19
0.44
0.06
-16.41
2.60
360.94
9.28
2414
1413
63
37
***
***
***
**
***
***
***
***
***
XTMIXED
se ( )
-8.20
0.56
-10.05
1.05
-1.42
0.26
-4.11
1.19
0.44
0.05
-16.40
2.55
360.99
9.14
2415
1418
63
37
***
***
***
**
***
***
***
-8.20
-10.05
-1.42
-4.11
0.44
-16.40
360.99
***
***
2415
1418
MIXED
se ( )
0.56
1.05
0.26
1.19
0.05
2.55
9.14
63
37
***
***
***
**
***
***
***
-8.20
-10.05
-1.42
-4.11
0.44
-16.40
360.99
***
***
2415
1419
HLM
se ( )
0.59
1.13
0.25
1.23
0.06
2.67
9.68
63
37
***
***
***
**
***
***
***
***
***
The main difference between the different programs relies on the type of dependent variable that we want to model. Thus, in the case
of HLM and GLLAMM, it is possible to model two level or three level models with dependent variables with discrete or binary
outcomes; while the XTMIXED and MIXED in STATA and SPSS respectively could be employed with continuous variables. Also,
one of the disadvantages of STATA is that XTMIXED and GLLAMM could not be estimated in combination with sampling weights,
while SPSS and HLM allow the user to use sampling weights. Also, in the case of HLM, one advantage is that it is possible to use
sampling weights at each level.
31
32