0% found this document useful (0 votes)

69 views

Profile Likelihood Method

The document discusses profiling out nuisance parameters from a likelihood function to estimate parameters of interest. It provides an example of estimating parameters from a Weibull distribution using profiling. Specifically: 1) Profiling involves treating one set of parameters (nuisance parameters) as fixed, and maximizing the likelihood with respect to the other set (parameters of interest) to estimate them. 2) For a Weibull distribution example, profiling allows deriving the maximum likelihood estimator of one parameter (θ) for a fixed value of the other (α). 3) The maximum likelihood estimator of the parameter of interest (β) is then found by maximizing over the profile likelihoods. This avoids directly maximizing the full likelihood.

Uploaded by

Pramesh Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views

Profile Likelihood Method

Uploaded by

Pramesh Kumar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Chapter 3

The Profile Likelihood

3.1 The Profile Likelihood

3.1.1 The method of profiling

Let us suppose that the unknown parameters ✓ can be partitioned as ✓0 = ( 0 , 0 ), where
are the p-dimensional parameters of interest (eg. mean) and are the q-dimensional
nuisance parameters (eg. variance). We will need to estimate both and , but our
interest lies only in the parameter . To achieve this one often profiles out the nuisance
parameters. To motivate the profile likelihood, we first describe a method to estimate the
parameters ( , ) in two stages and consider some examples.
Let us suppse that {Xi } are iid random variables, with density f (x; , ) where our
objective is to estimate and . In this case the log-likelihood is
n
X
Ln ( , ) = log f (Xi ; , ).
i=1

To estimate and one can use ( ˆ n , ˆn ) = arg max , Ln ( , ). However, this can be
difficult to directly maximise. Instead let us consider a di↵erent method, which may,
sometimes, be easier to evaluate. Suppose, for now, is known, then we rewrite the
likelihood as Ln ( , ) = L ( ) (to show that is fixed but varies). To estimate we
maximise L ( ) with respect to , i.e.

ˆ = arg max L ( ).

99
In reality is unknown, hence for each we can evaluate ˆ . Note that for each , we
have a new curve L ( ) over . Now to estimate , we evaluate the maximum L ( ),
over , and choose the , which is the maximum over all these curves. In other words,
we evaluate

ˆn = arg max L ( ˆ ) = arg max Ln ( , ˆ ).

A bit of logical deduction shows that ˆn and ˆn are the maximum likelihood estimators
( ˆ n , ˆn ) = arg max , Ln ( , ).
We note that we have profiled out nuisance parameter , and the likelihood L ( ˆ ) =
Ln ( , ˆ ) is in terms of the parameter of interest .
The advantage of this procedure is best illustrated through some examples.

Example 3.1.1 (The Weibull distribution) Let us suppose that {Xi } are iid random
↵y ↵ 1
variables from a Weibull distribution with density f (x; ↵, ✓) = ✓↵
exp( (y/✓)↵ ). We
know from Example 2.2.2, that if ↵, were known an explicit expression for the MLE can
be derived, it is

✓ˆ↵ = arg max L↵ (✓)

✓
Xn ✓ ◆
Yi ↵
= arg max log ↵ + (↵ 1) log Yi ↵ log ✓
✓
i=1
✓
n ✓
X ◆ n
Yi ↵ 1 X ↵ 1/↵
= arg max ↵ log ✓ =( Y ) ,
✓
i=1
✓ n i=1 i
✓ ◆
Pn Yi ↵
where L↵ (X; ✓) = i=1 log ↵ + (↵ 1) log Yi ↵ log ✓ ✓
. Thus for a given ↵,
the maximum likelihood estimator of ✓ can be derived. The maximum likelihood estimator
of ↵ is
n ✓
X n ◆
1 X ↵ 1/↵ Y ↵
↵
ˆ n = arg max log ↵ + (↵ 1) log Yi ↵ log( Y ) Pn i ↵ 1/↵ .
↵
i=1
n i=1 i 1
( n i=1 Yi )

Pn
Therefore, the maximum likelihood estimator of ✓ is ( n1 i=1 Yi↵ˆ n )1/↵ˆ n . We observe that
ˆ n can be tricky but no worse than maximising the likelihood Ln (↵, ✓) over ↵
evaluating ↵
and ✓.

100
As we mentioned above, we are not interest in the nuisance parameters and are only
interesting in testing and constructing CIs for . In this case, we are interested in the
limiting distribution of the MLE ˆn . Using Theorem 2.6.2(ii) we have
! ✓ ! 1 ◆
p ˆn D I I
n ! N 0, .
ˆn I I

where
! !
@ 2 log f (Xi ; , ) @ 2 log f (Xi ; , )
I I E @ 2
E @ @
= @ 2 log f (Xi ; , ) 0 @ 2 log f (Xi ; , )
. (3.1)
I I E @ @
E @ 2

p
To derive an exact expression for the limiting variance of n( ˆn ), we use the block
inverse matrix identity.

Remark 3.1.1 (Inverse of a block matrix) Suppose that

!
A B
C D

is a square matrix. Then

! 1 !
A B (A BD 1 C) 1
A 1 B(D CA 1 B) 1
= . (3.2)
C D D 1 CB(A BD 1 C) 1
(D CA 1 B) 1

Using (3.2) we have

p D
n( bn ) ! N (0, (I , I , I 1
I , ) 1 ). (3.3)

Thus if is a scalar we can use the above to construct confidence intervals for .

Example 3.1.2 (Block diagonal information matrix) If

!
I , 0
I( , ) = ,
0 I ,

then using (3.3) we have

p D
n( bn ) ! N (0, I ,
1
).

101
3.1.2 The score and the log-likelihood ratio for the profile like-
lihood
To ease notation, let us suppose that 0 and 0 are the true parameters in the distribution.
We now consider the log-likelihood ratio
⇢
2 max Ln ( , ) max Ln ( 0, ) , (3.4)
,

where 0 is the true parameter. However, to derive the limiting distribution in this case
for this statistic is a little more complicated than the log-likelihood ratio test that does
not involve nuisance parameters. This is because directly applying Taylor expansion does
not work since this is usually expanded about the true parameters. We observe that
⇢
2 max Ln ( , ) max Ln ( 0 , )
,
⇢ n o
= 2 max Ln ( , ) Ln ( 0 , 0 ) 2 max Ln ( 0 , ) max Ln ( 0 , 0 ) .
,
| {z } | {z }
2 2
q
p+q

2
It seems reasonable that the di↵erence may be a p but it is really not clear by. Below,
we show that by using a few Taylor expansions why this is true.
In the theorem below we will derive the distribution of the score and the nested log-
likelihood.

Theorem 3.1.1 Suppose Assumption 2.6.1 holds. Suppose that ( 0, 0) are the true
parameters. Then we have

@Ln ( , ) @Ln ( , ) @Ln ( , ) 1

cˆ , ⇡ c 0, 0 c 0, 0 I I 0 0 (3.5)
@ 0 0
@ @ 0 0

1 @Ln ( , ) D 1
p c 0,
ˆ ! N (0, (I 0 0 I 0 0 I I 0, 0 )) (3.6)
n @ 0 0 0

where I is defined as in (3.1) and

⇢
D
2 Ln ( ˆn , ˆ n ) Ln ( 0,
ˆ )
0 ! 2
p, (3.7)

where p denotes the dimension of . This result is often called Wilks Theorem.

102
PROOF. We first prove (3.5) which is the basis of the proofs of (3.6). To avoid, notational
@Ln ( , ) @Ln ( , )
difficulties we will assume that @
cˆ , 0
and @
c = 0, 0 are univariate random
0

variables.
@Ln ( , ) @Ln ( , )
Our objective is to find an expression for @
cˆ , 0
in terms of @
c = 0, 0 and
0
@Ln ( , )
@
c = 0, 0 which will allow us to obtain its variance and asymptotic distribution.
@Ln ( , ) @Ln ( , )
Making a Taylor expansion of @
cˆ , 0
about @
c 0, 0 gives
0

@Ln ( , ) @Ln ( , ) @ 2 Ln ( , )
cˆ , ⇡ c 0, 0 + (ˆ 0 0) c 0, 0 .
@ 0 0
@ @ @

Notice that we have used ⇡ instead of = because we replace the second derivative
@ 2 Ln ( , )
with its true parameters. If the sample size is large enough then @ @
c 0, 0 ⇡
@ 2 Ln ( , )
E @ @
c 0, 0 ; eg. in the iid case we have

n
1 @ 2 Ln ( , ) 1 X @ 2 log f (Xi ; , )
c 0, 0 = c 0, 0
n @ @ n i=1 @ @
✓ 2 ◆
@ log f (Xi ; , )
⇡ E c 0, 0 = I ,
@ @

Therefore
@Ln ( , ) @Ln ( , )
cˆ , ⇡ c 0, 0 n( ˆ 0 0 )I . (3.8)
@ 0 0
@

Next we make a decomposition of ( ˆ 0 0 ). We recall that since Ln ( 0,

ˆ ) = arg max Ln (
0 0, )
then
@Ln ( , )
cˆ , =0
@ 0 0

(if the maximum is not on the boundary). Therefore making a Taylor expansion of
@Ln ( 0, ) @Ln ( 0, )
@
cˆ , 0
about @
c 0, 0 gives
0

@Ln ( 0 , ) @Ln ( 0 , ) @ 2 Ln ( 0, )
cˆ , ⇡ c 0, 0 + 2
c 0, 0 (ˆ 0 0 ).
| @ {z @ @
0 0
}
=0

@ 2 Ln ( 0 , )
Replacing @ 2
c 0, 0 with I gives

@Ln ( 0 , )
c 0, 0 nI ( ˆ 0 0) ⇡ 0,
@
103
and rearranging the above gives
1
@Ln ( 0 , ) I
(ˆ 0 0) c⇡ 0, 0 . (3.9)
n @
Therefore substituting (3.9) into (3.8) gives
@Ln ( , ) @Ln ( , ) @Ln ( 0 , ) 1
cˆ , ⇡ c 0, 0 c 0, 0 I I
@ 0 0
@ @
and thus we have proved (3.5).
To prove (3.6) we note that
@Ln ( , ) @Ln ( 0 , ) 0 1 0
cˆ , ⇡ c 0, I, I , . (3.10)
@ 0 0
@✓ 0

We recall that the regular score function satisfies

!
@Ln ( , )
1 @Ln ( , ) 1 @
c 0, 0 D
p c 0, 0 =p @Ln ( , )
! N (0, I(✓0 )).
n @✓ n c 0, 0
@

Now by substituting the above into (3.10) and calculating the variance gives (3.6).
Finally to prove (3.7) we apply the Taylor expansion on the decomposition
⇢ ⇢ ⇢
ˆ ˆ ˆ
2 Ln ( n , n ) Ln ( 0 , 0 ) ˆ ˆ
= 2 Ln ( n , n ) Ln ( 0 , 0 ) 2 Ln ( 0 , ˆ 0 ) Ln ( 0, 0)

⇡ (✓ˆn ✓0 )0 I(✓)(✓ˆn ✓0 ) (ˆ 0 0)
0
I (ˆ 0 0 ), (3.11)

where ✓ˆn0 = ( ˆ, ˆ ) (the mle). We now find an approximation of ( ˆ 0 0)

0
in terms
(✓ˆn ✓0 ). We recall that (✓b ✓) = I(✓0 ) 1 r✓ Ln (✓)c✓=✓0 therefore
! ! !
@Ln (✓)
I I bn 0
@
⇡ (3.12)
@Ln (✓)
I I bn n
@
@Ln (✓)
From (3.9) and the expansion of given in (3.12) we have
@

I 1 @Ln ( 0 , ) I 1⇣ ⌘
(ˆ 0 0 ) ⇡ c 0, 0 ⇡ I (ˆ 0) + I (
ˆ 0)
n @ n ⇣ ⌘
⇡ I 1I ( ˆ 0) + (
ˆ 0) = I
1
I , 1 ✓bn ✓0 .

Substituting the above into (3.11) and making lots of cancellations we have
⇢
2 Ln ( ˆn , ˆ n ) Ln ( 0 , ˆ 0 ) ⇡ n( ˆ 0
0 ) (I I I ,1 I , )( ˆ 0 ).

p D
Finally, by using (3.3) we substitute n( ˆ 0) ! N (0, (I I I ,
1
I , ) 1 ), into the
above which gives the desired result. ⇤

104
Remark 3.1.2 (i) The limiting variance of b 0 if were known is I ,
1
, whereas
the the limiting variance of @Ln@( , ) c ˆ , 0 is (I I I ,1 I , ) and the limiting
p 0

variance of n( ˆ 0 ) is (I I I ,1 I , ) 1 . Therefore if and are scalars

and the correlation I , is positive, then the limiting variance of b 0 is more than
if were known. This makes sense, if we have less information the variance grows.

(ii) Look again at the expression

@Ln ( , ) @Ln ( , ) 1 @Ln ( 0 , )

cˆ , ⇡ c 0, 0 I I c 0, 0 (3.13)
@ 0 0
@ @

It is useful to understand where it came from. Consider the problem of linear re-
gression. Suppose X and Y are random variables and we want to construct the
best linear predictor of Y given X. We know that the best linear predictor is
Ŷ (X) = E(XY )/E(Y 2 )X and the residual and mean squared error is
✓ ◆2
E(XY ) E(XY )
Y Ŷ (X) = Y X and E Y X = E(Y 2 ) E(XY )E(Y 2 ) 1 E(XY ).
E(Y 2 ) E(Y 2 )
@Ln ( , )
Compare this expression with (3.13). We see that in some sense @
cˆ , 0 can
0
@Ln ( , ) @Ln ( 0 , )
be treated as the residual (error) of the projection of @
c 0, 0 onto @
c 0, 0 .

3.1.3 The log-likelihood ratio statistics in the presence of nui-

sance parameters
Theorem 3.1.1 can be used to test H0 : = 0 against HA : 6= 0 since
⇢
D 2
2 max Ln ( , ) max Ln ( 0 , ) ! p.
,

The same quantity can be used in the construction of confidence intervals By using (3.7)
we can construct CIs. For example, to construct a 95% CI for we can use the mle
✓ˆn = ( ˆn , ˆ n ) and the profile likelihood (3.7) to give
⇢ ⇢
; 2 Ln ( ˆn , ˆ n ) Ln ( , ˆ )  2p (0.95) .

Example 3.1.3 (The normal distribution and confidence intervals) This example
is taken from Davidson (2004), Example 4.31, p129.

105
We recall that the log-likelihood for {Yi } which are iid random variables from a normal
2
distribution with mean µ and variance is
n
2 2 1 X n
Ln (µ, ) = Lµ ( ) = 2
(Yi µ)2 log 2
.
2 i=1
2

Our aim is to the use the log-likelihood ratio statistic, analogous to Section 2.8.1 to con-
2
struct a CI for µ. Thus we treat as the nuisance parameter.
2 1
Pn
Keeping µ fixed, the maximum likelihood estimator of is b2 (µ) = n i=1 (Yi µ)2 .
Rearranging b2 (µ) gives
✓ ◆
n 1 t2 (µ)
2
b (µ) = s 2
1+ n
n n 1
1
Pn
where t2n (µ) = n(Ȳ µ)2 /s2 and s2 = n 1 i=1 (Yi Ȳ )2 . Substituting b2 (µ) into Ln (µ, 2
)
gives the profile likelihood
n
2 1 X n
Ln (µ, b (µ)) = (Yi µ)2 log b2 (µ)
b2 (µ) i=1 2
| {z }
= n/2
⇢ ✓ ◆
n n n 1 t2 (µ)
= log s 2
1+ n .
2 2 n n 1

It is clear that Ln (µ, b2 (µ)) is maximised at µ

b = Ȳ . Hence
⇢
2 n n n 1 2
Ln (b
µ, b (bµ)) = log s .
2 2 n

Thus the log-likelihood ratio is

◆ ✓
2 2 t2n (µ)
Wn (µ) = 2 Ln (b
µ, b (b
µ)) Ln (µ, b (µ)) = n log 1 + .
n 1
| {z }
D 2
! 1 for true µ

Therefore, using the same argument to those in Section 2.8.1, the 95% confidence interval
for the mean is

µ, b2 (b
µ; 2 Ln (b µ)) Ln (µ, b2 (µ)) = µ; Wn (µ)  21 (0.95)
⇢ ✓ ◆
t2n (µ) 2
= µ; n log 1 +  1 (0.95) .
n 1

106
However, this is an asymptotic result. With the normal distribution we can get the exact
distribution. We note that since log is a monotonic function the log-likelihood ratio is
equivalent to

µ; t2n (µ)  C↵ ,

where C↵ is an appropriately chosen critical value. We recall that tn (µ) is a t-distribution

with n 1 degrees of freedom. Thus C↵ is the critical value corresponding to a Hotelling
T 2 -distribution.

2
Exercise 3.1 Derive the test for independence (in the case of two by two tables) using
the log-likelihood ratio test. More precisely, derive the asymptotic distribution of
(O1 E1 )2 (O2 E2 )2 (O3 E3 )2 (O4 E4 )2
T = + + , +
E1 E2 E3 E4
under the null that there is no association between the categorical variables C and R,
where and E1 = n3 ⇥ n1 /N , E2 = n4 ⇥ n1 /N , E3 = n3 ⇥ n2 /N and E2 = n4 ⇥ n2 /N . State

C1 C2 Subtotal
R1 O1 O2 n1
R2 O3 O4 n2
Subtotal n3 n4 N

all results you use.

Hint: You may need to use the Taylor approximation x log(x/y) ⇡ (x y)+ 12 (x y)2 /y.

Pivotal Quantities

Pivotal quantities are statistics whose distribution does not depend on any parameters.
p
These include the t-ratio t = n(X̄ µ)/sn ⇠ tn 1 (in the case the data is normal) F -test
etc.
In many applications it is not possible to obtain a pivotal quantity, but a quantity can
be asymptotically pivotal. The log-likelihood ratio statistic is one such example (since its
distribution is a chi-square).
Pivotal statistics have many advantages. The main is that it avoids the need to
estimate extra parameters. However, they are also useful in developing Bootstrap methods
etc.

107
3.1.4 The score statistic in the presence of nuisance parameters
We recall that we used Theorem 3.1.1 to obtain the distribution of 2 max , Ln ( , )
max Ln ( 0, ) under the null, we now consider the score test.
@Ln ( , )
We recall that under the null H0 : = 0 the derivative= 0, but the@
cˆ , 0
0

same is not true of @Ln ( , )

@
cˆ , 0
. However, if the null were true we would expect ˆ 0 to
0
@Ln ( , )
be close to the true 0 and for @
cˆ , 0
to be close to zero. Indeed this is what we
0

showed in (3.6), where we showed that under the null

1/2 @Ln ( , ) D 1
n cˆ ! N (0, I I I , I , ), (3.14)
@ 0

where 0 = arg max Ln ( 0, ).

Therefore (3.14) suggests an alternative test for H0 : = 0 against HA : 6= 0. We
@Ln ( , )
can use p1 cˆ as the test statistic. This is called the score or LM test.
n @ 0

The log-likelihood ratio test and the score test are asymptotically equivalent. There
are advantages and disadvantages of both.

(i) An advantage of the log-likelihood ratio test is that we do not need to calculate the
information matrix.

(ii) An advantage of the score test is that we do not have to evaluate the the maximum
likelihood estimates under the alternative model.

3.2 Applications

3.2.1 An application of profiling to frequency estimation

Suppose that the observations {Xt ; t = 1, . . . , n} satisfy the following nonlinear regression
model

Xt = A cos(!t) + B sin(!t) + "i

where {"t } are iid standard normal random variables and 0 < ! < ⇡ (thus allowing the
case ! = ⇡/2, but not the end points ! = 0 or ⇡). The parameters A, B, and ! are real
and unknown. Full details can be found in the paper http://www.jstor.org/stable/
pdf/2334314.pdf (Walker, 1971, Biometrika).

108
(i) Ignoring constants, obtain the log-likelihood of {Xt }. Denote this likelihood as
Ln (A, B, !).

(ii) Let
✓X
n n
X ◆
1
Sn (A, B, !) = Xt2 2 Xt A cos(!t) + B sin(!t) 2 2
n(A + B ) .
t=1 t=1
2
Show that
n n
(A2 B2) X X
2Ln (A, B, !) + Sn (A, B, !) = cos(2t!) + AB sin(2t!).
2 t=1 t=1

Thus show that |Ln (A, B, !) + 12 Sn (A, B, !)| = O(1) (ie. the di↵erence does not
grow with n).

1
Since Ln (A, B, !) and S (A, B, !)
2 n
are asymptotically equivalent, for the rest of
1
this question, use 2 n
S (A, B, !) instead of the likelihood Ln (A, B, !).

(iii) Obtain the profile likelihood of !.

Pn
ˆ n = arg max! |
(hint: Profile out the parameters A and B, to show that ! t=1 Xt exp(it!)|2 ).
Suggest, a graphical method for evaluating !
ˆn?

(iv) By using the identity

8
n < exp( 12 i(n+1)⌦) sin( 12 n⌦)
X 0 < ⌦ < 2⇡
sin( 12 ⌦)
exp(i⌦t) = (3.15)
t=1
: n ⌦ = 0 or 2⇡.
show that for 0 < ⌦ < 2⇡ we have
Xn n
X
t cos(⌦t) = O(n) t sin(⌦t) = O(n)
t=1 t=1
n
X Xn
t2 cos(⌦t) = O(n2 ) t2 sin(⌦t) = O(n2 ).
t=1 t=1

(v) By using the results in part (iv) show that the Fisher Information of Ln (A, B, !)
(denoted as I(A, B, !)) is asymptotically equivalent to
0 1
n n2
2
0 2
B + O(n)
2
@ Sn B C
=B C.
n n 2
2I(A, B, !) = E @ 0 A + O(n) A
@! 2 2 2
n2 n2 n3 2
2
B + O(n) 2
A + O(n) 3
(A + B ) + O(n2 )
2

109
(vi) Derive the asymptotic variance of maximum likelihood estimator, !
ˆ n , derived in
part (iv).

Comment on the rate of convergence of !

ˆn.

Useful information: The following quantities may be useful:

8
n < exp( 12 i(n+1)⌦) sin( 12 n⌦)
X 0 < ⌦ < 2⇡
sin( 12 ⌦)
exp(i⌦t) = (3.16)
: n ⌦ = 0 or 2⇡.
t=1

the trignometric identities: sin(2⌦) = 2 sin ⌦ cos ⌦, cos(2⌦) = 2 cos2 (⌦) 1 = 1 2 sin2 ⌦,
exp(i⌦) = cos(⌦) + i sin(⌦) and

n
X n
X
n(n + 1) n(n + 1)(2n + 1)
t= t2 = .
t=1
2 t=1
6

Solution
Since {"i } are standard normal iid random variables the likelihood is

n
1X
Ln (A, B, !) = (Xt A cos(!t) B sin(!t))2 .
2 t=1

If the frequency ! were known, then the least squares estimator of A and B would be

! n
! 1 n
!
b
A X 1X cos(!t)
= n 1
x0t xt Xt
b
B n t=1 sin(!t)
t=1

where xt = (cos(!t), sin(!t)). However, because the sine and cosine functions are near
P
orthogonal we have that n 1 nt=1 x0t xt ⇡ I2 and

! n
!
b
A 1X cos(!t)
⇡ Xt ,
b
B n t=1 sin(!t)

110
which is simple to evaluate! The above argument is not very precise. To make it precise
we note that

2Ln (A, B, !)
n
X X n
= Xt2 2 Xt A cos(!t) + B sin(!t)
t=1 t=1
n
X n
X n
X
2 2 2 2
+A cos (!t) + B sin (!t) + 2AB sin(!t) cos(!t)
t=1 t=1 t=1
n
X n
X
= Xt2 2 Xt A cos(!t) + B sin(!t) +
t=1 t=1
n n n
A X
2
B2 X X
(1 + cos(2t!)) + (1 cos(2t!)) + AB sin(2t!)
2 t=1 2 t=1 t=1
n
X n
X n 2
= Xt2 2 Xt A cos(!t) + B sin(!t) + (A + B 2 ) +
t=1 t=1
2
2 2 Xn n
X
(A B )
cos(2t!) + AB sin(2t!)
2 t=1 t=1
n n
(A 2
B )X2 X
= Sn (A, B, !) + cos(2t!) + AB sin(2t!)
2 t=1 t=1

where
n
X n
X n 2
Sn (A, B, !) = Xt2 2 Xt A cos(!t) + B sin(!t) + (A + B 2 ).
t=1 t=1
2

The important point abut the above is that n 1 Sn (A, B, !) is bounded away from zero,
P P
however n 1 nt=1 sin(2!t) and n 1 nt=1 cos(2!t) both converge to zero (at the rate n 1 ,
though it is not uniform over !); use identity (3.16). Thus Sn (A, B, !) is the dominant
term in Ln (A, B, !);

2Ln (A, B, !) = Sn (A, B, !) + O(1).

Thus ignoring the O(1) term and di↵erentiating Sn (A, B, !) wrt A and B (keeping !
fixed) gives the estimators
! n
!
b
A(!) 1X cos(!t)
= Xt .
b
B(!) n t=1 sin(!t)

111
Thus we have “profiled out” the nuisance parameters A and B.
Using the approximation Sn (A bn (!), B
bn (!), !) we have

bn (!), B
bn (!), !) = 1
Ln (A Sp (!) + O(1),
2

where
✓X
n n
X ◆
bn (!) cos(!t) + B(!)
b n b b 2
Sp (!) = Xt2 2 Xt A sin(!t) + (A 2
n (!) + B(!) )
t=1 t=1
2
✓X
n  ◆
n b bn (!)2 .
= Xt2 An (!)2 + B
t=1
2

Thus

bn (!), B
bn (!), !) ⇡ arg max 1
arg max Ln (A Sp (!)
2
bn (!)2 + B
= arg max A bn (!)2 .

Thus

bn (!)2 + B
bn = arg max( 1/2)Sp (!) = arg max A
! bn (!)2
! !
n
X 2
= arg max Xt exp(it!) ,
!
t=1

which is easily evaluated (using a basic grid search).

(iv) Di↵erentiating both sides of (3.15) with respect to ⌦ and considering the real and
P Pn
imaginary terms gives nt=1 t cos(⌦t) = O(n) t=1 t sin(⌦t) = O(n). Di↵erenti-
ating both sides of (3.15) twice wrt to ⌦ gives the second term.

b ! ), B(b
b , A(b
(v) In order to obtain the rate of convergence of the estimators, ! b ! ) we eval-
uate the Fisher information of Ln (the inverse of which will give us limiting rate
of convergence). For convenience rather than take the second derivative of L we
evaluate the second derivative of Sn (A, B, !) (though, you will find the in the limit
both the second derivative of Ln and Sn (A, B, !) are the same).
Pn 2
Pn 1 2
Di↵erentiating Sn (A, B, !) = t=1 Xt 2 t=1 Xt A cos(!t)+B sin(!t) + 2 n(A +

112
B 2 ) twice wrt to A, B and ! gives
n
X
@Sn
= 2 Xt cos(!t) + An
@A t=1
n
X
@Sn
= 2 Xt sin(!t) + Bn
@B t=1
n
X n
X
@Sn
=2 AXt t sin(!t) 2 BXt t cos(!t).
@! t=1 t=1

@ 2 Sn @ 2 Sn @ 2 Sn
and @A2
= n, @B 2
= n, @A@B
= 0,
X n
@ 2 Sn
=2 Xt t sin(!t)
@!@A t=1
n
X
@ 2 Sn
= 2 Xt t cos(!t)
@!@B t=1
X n
@ 2 Sn
= 2 t2 Xt A cos(!t) + B sin(!t) .
@! 2 t=1

Now taking expectations of the above and using (v) we have

X n
@ 2 Sn
E( ) = 2 t sin(!t) A cos(!t) + B sin(!t)
@!@A t=1
n
X n
X
2
= 2B t sin (!t) + 2 At sin(!t) cos(!t)
t=1 t=1
n
X n
X n(n + 1) n2
= B t(1 cos(2!t)) + A t sin(2!t) = B + O(n) = B + O(n).
t=1 t=1
2 2

@ Sn 2 2
Using a similar argument we can show that E( @!@B )= A n2 + O(n) and
Xn ✓ ◆2
@ 2 Sn 2
E( ) = 2 t A cos(!t) + B sin(!t)
@! 2 t=1
n(n + 1)(2n + 1)
= (A2 + B 2 ) + O(n2 ) = (A2 + B 2 )n3 /3 + O(n2 ).
6
Since E( r2 Ln ) ⇡ 12 E(r2 Sn ), this gives the required result.

(vi) Noting that the asymptotic variance for the profile likelihood estimator !
ˆn
✓ ◆ 1
1
I!,! I!,(AB) IA,B I(BA),! ,

113
by subsituting (vi) into the above we have
✓ 2 ◆ 1
A + B2 3 2 12
2 n + O(n ) ⇡
6 (A2 + B 2 )n3
ˆ n is O(n 3 ).
Thus we observe that the asymptotic variance of !
Typically estimators have a variance of order O(n 1 ), so we see that the estimator
!
ˆ n converges to to the true parameter, far faster than expected. Thus the estimator
is extremely good compared with the majority of parameter estimators.

Exercise 3.2 Run a simulation study to illustrate the above example.

2⇡k
Pn 2⇡k
Evaluate In (!) for all !k = n
using the fft function in R (this evaluates { t=1 Yt eit n }nk=1 ),
then take the absolute square of it. Find the maximum over the sequence using the function
bn . From this, estimate A and B. However, !
which.max. This will estimate ! bn will only
estimate ! to Op (n 1 ), since we have discretized the frequencies. To improve on this, one
can use one further iteration see http: // www. jstor. org/ stable/ pdf/ 2334314. pdf
for the details.
Run the above over several realisations and evaluate the average squared error.

3.2.2 An application of profiling in survival analysis

This application uses some methods from Survival Analysis which is covered later in this
course.
Let Ti denote the survival time of an electrical component (we cover survival functions
in Chapter 6.1). Often for each survival time, there are known regressors xi which are
believed to influence the survival time Ti . The survival function is defined as

P (Ti > t) = Fi (t) t 0.

It is clear from the definition that what defines a survival function is that Fi (t) is positive,
Fi (0) = 1 and Fi (1) = 0. The density is easily derived from the survival function taking
dFi (t)
the negative derivative; fi (t) = dt
.
To model the influence the regressors have on the survival time, the Cox-proportional
hazard model is often used with the exponential distribution as the baseline distribution
and (xi ; ) is a positive “link” function (typically, we use (xi ; ) = exp( xi ) as the link
function). More precisely the survival function of Ti is
(xi ; )
Fi (t) = F0 (t) ,

114
where F0 (t) = exp( t/✓). Not all the survival times of the electrical components are
observed, and there can arise censoring. Hence we observe Yi = min(Ti , ci ), where ci is
the (non-random) censoring time and i, where i is the indicator variable, where i =1
denotes censoring of the ith component and i = 0 denotes that it is not censored. The
parameters and ✓ are unknown.

(i) Derive the log-likelihood of {(Yi , i )}.

(ii) Compute the profile likelihood of the regression parameters , profiling out the
baseline parameter ✓.

Solution

(i) The survivial function and the density are

[ (xi ; ) 1] (xi ; )
fi (t) = (xi ; ) F0 (t) f0 (t) and Fi (t) = F0 (t) .

Thus for this example, the logarithm of density and survival function is
⇥ ⇤
log fi (t) = log (xi ; ) (xi ; ) 1 F0 (t) + log f0 (t)
⇥ ⇤t t
= log (xi ; ) (xi ; ) 1 log ✓
✓ ✓
t
log Fi (t) = (xi ; ) log F0 (t) = (xi ; ) .
✓
Since
( [ (xi ; ) 1
fi (yi ) = (xi ; ) F0 (yi ) f0 (t) i = 0
fi (yi , i ) =
Fi (yi ) = F0 (t) (xi ; ) i = 1

the log-likelihood of ( , ✓) based on (Yi , i ) is

n
X
Ln ( , ✓) = (1 i) log (xi ; ) + log f0 (Yi ) + ( (xi ; ) 1) log F0 (Yi ) +
i=1
n
X
i (xi ; ) log F0 (Yi )
i=1
Xn ✓ ◆
Yi Yi
= (1 i) log (xi ; ) log ✓ ( (xi ; ) 1)
i=1
✓ ✓
n
X Yi
i (xi ; )
i=1
✓
n
X n
X Yi
= (1 i) log (xi ; ) log ✓ (xi ; )
i=1 i=1
✓

115
Di↵erentiating the above wrt and ✓ gives
Xn n
X
@L r (xi ; ) Yi
= (1 i) r (xi ; )
@ i=1
(xi ; ) i=1
✓
n
X n
X
@L 1 Yi
= (1 i) + (xi ; )
@✓ i=1
✓ i=1
✓2

which is not simple to solve.

(ii) Instead we keep fixed and di↵erentiate the likelihood with respect to ✓ and equate
to zero, this gives
Xn X
@Ln 1 Yi
= (1 i) + (xi ; ) 2
@✓ i=1
✓ i=1
✓

and
Pn
b i=1 (xi ; )Yi
✓( ) = P n .
i=1 (1 i)

This gives us the best estimator of ✓ for a given . Next we find the best estimator
of . The profile likelihood (after profiling out ✓) is
n
X n
X
b )) = b ) Yi
`P ( ) = Ln ( , ✓( (1 i) log (xi ; ) log ✓( (xi ; ) .
b )
✓(
i=1 i=1

Hence to obtain the ML estimator of we maximise the above with respect to ,

this gives us b. Which in turn gives us the MLE ✓(
b b).

3.2.3 An application of profiling in semi-parametric regression

Here we apply the profile “likelihood” (we use inverted commas here because we do not
use the likelihood, but least squares instead) to semi-parametric regression. Recently this
type of method has been used widely in various semi-parametric models. This application
requires a little knowledge of nonparametric regression, which is considered later in this
course. Suppose we observe (Yi , Ui , Xi ) where

Yi = Xi + (Ui ) + "i ,

(Xi , Ui , "i ) are iid random variables and is an unknown function. Before analyzing the
model we summarize some of its interesting properties:

116
• When a model does not have a parametric form (i.e. a finite number of parameters
1/2
cannot describe the model), then we cannot usually obtain the usual O(n ) rate.
We see in the above model that (·) does not have a parametric form thus we cannot
p
expect than an estimator of it n-consistent.

• The model above contains Xi which does have a parametric form, can we obtain
p
a n-consistent estimator of ?

The Nadaraya-Watson estimator

Suppose

Yi = (Ui ) + "i ,

where Ui , "i are iid random variables. A classical method for estimating (·) is to use the
Nadarayan-Watson estimator. This is basically a local least squares estimator of (u).
The estimator bn (u) is defined as

X1 ✓ ◆ P
bn (u) = arg min u Ui 2 Wb (u Ui )Yi
W (Yi a) = Pi
a
i
b b i Wb (u Ui )

R
where W (·) is a kernel (think local window function) with W (x)dx = 1 and Wb (u) =
b 1 W (u/b) with b ! 0 as n ! 1; thus the window gets narrower and more localized
P
as the sample size grows. Dividing by i Wb (u Ui ) “removes” the clustering in the
locations {Ui }.
Note that the above can also be treated as an estimator of
Z Z
yfY,U (y, u)
E (Y |U = u) = yfY |U (y|u)dy dy = (u),
R R fU (u)

where we replace fY,U and fU with

n
1 X
fbY,U (u, y) = Yi (y)Wb (u Ui )
bn i=1
n
1 X
fbU (u) = Wb (u Ui ) ,
bn i=1

117
with Y (y) denoting the Dirac-delta function. Note that the above is true because
Z b Z
fY,U (y, u) 1
dy = y fbY,U (y, u)dy
R b
fU (u) b
fU (u) R
Z n
1 1 X
= y Yi (y)Wb (u Ui ) dy
fbU (u) R bn i=1
n Z P
1 1 X Wb (u Ui )Yi
= Wb (u Ui ) y Yi (y)dy = Pi .
fbU (u) bn i=1 | R
{z } i Wb (u Ui )
=Yi

The Nadaraya-Watson estimator is a non-parametric estimator and su↵ers from a far

slower rate of convergence to the non-parametric function than parametric estimators.
This rates are usually (depending on the smoothness of and the density of U )
✓ ◆
b 2 1 4
| n (u) (u)| = Op +b .
bn
Since b ! 0, bn ! 1 as n ! 1 we see this is far slower than the parametric rate
1/2
Op (n ). Heuristically, this is because not all n observations are used to estimate (·)
at any particular point u (the number is about bn).

Estimating using the Nadaraya-Watson estimator and profiling

To estimate , we first profile out (·) (this is the nuisance parameter), which we estimate
as if were known. In other other words, we suppose that were known and let

Y i ( ) = Yi Xi = (Ui ) + "i ,

We then estimate (·) using the Nadaraya-Watson estimator, in other words the (·)
which minimises the criterion
X P
ˆ (u) = arg min 2 Wb (u
iP Ui )Yi ( )
Wb (u Ui )(Yi ( ) a) =
i Wb (u Ui )
a
i
P P
Wb (u Ui )Yi i Wb (u Ui )Xi
= Pi P
i Wb (u Ui ) i Wb (u Ui )
:= Gb (u) Hb (u), (3.17)

where
P P
Wb (u Ui )Yi i Wb (u Ui )Xi
Gb (u) = Pi and Hb (u) = P .
i Wb (u Ui ) i Wb (u Ui )

118
Thus, given , the estimator of and the residuals "i are
ˆ (u) = Gb (u) Hb (u)

and

"b = Yi Xi ˆ (Ui ).

Given the estimated residuals Yi Xi ˆ (Ui ) we can now use least squares to estimate
coefficient . We define the least squares criterion
X 2
Ln ( ) = Yi Xi ˆ (Ui )
i
X 2
= Yi Xi Gb (Ui ) + Hb (Ui )
i
X 2
= Yi Gb (Ui ) [Xi Hb (Ui )] .
i

Therefore, the least squares estimator of is

P
ˆb,T = i [Yi P Gb (Ui )][Xi Hb (Ui )]
.
i [Xi Hb (Ui )]2
Using b,T we can then estimate (3.18). We observe how we have the used the principle of
profiling to estimate the unknown parameters. There is a large literature on this, including
Wahba, Speckman, Carroll, Fan etc. In particular it has been shown that under some
p
conditions on b (as T ! 1), the estimator ˆb,T has the usual n rate of convergence.
It should be mentioned that using random regressors Ui is not necessary. It could
P
be that Ui = ni (observations lie on a on a grid). In this case n 1 i Wb (u i/n) =
1
Pn u i/n R
nb i=1 W ( b
) = b 1
W ( u b x )dx + O((bn) 1 ) = 1 + O((bn) 1 ) (with a change of
variables). This gives
P
X i
ˆ (u) = arg min i 2 i Wb (u n
)Yi ( )
Wb (u )(Yi ( ) a) = P i
a
i
n i Wb (u n
)
X i X
= Wb (u )Yi Wb (u Ui )Xi
i
n i
:= Gb (u) Hb (u), (3.18)

where
X i X i
Gb (u) = Wb (u )Yi and Hb (u) = Wb (u )Xi .
i
n i
n
Using the above estimator of (·) we continue as before.

119

Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
Instructor S Manual The Structure of Eco
No ratings yet
Instructor S Manual The Structure of Eco
36 pages
Notes
No ratings yet
Notes
10 pages
Prints PDF
No ratings yet
Prints PDF
106 pages
All Ex Sol
No ratings yet
All Ex Sol
43 pages
NOTES
No ratings yet
NOTES
14 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
Statistics
No ratings yet
Statistics
60 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
2023 Tarea Curso Identificacion
No ratings yet
2023 Tarea Curso Identificacion
10 pages
Homework1_solution
No ratings yet
Homework1_solution
9 pages
Pattern Classification: HW3: 1 Exercise 3.6
No ratings yet
Pattern Classification: HW3: 1 Exercise 3.6
11 pages
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
No ratings yet
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
11 pages
Maximum Likelihood An Introduction: L. Le Cam
No ratings yet
Maximum Likelihood An Introduction: L. Le Cam
31 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
STAT4027 Assignment 1: Lewis Hastie
No ratings yet
STAT4027 Assignment 1: Lewis Hastie
26 pages
Math435 HW 8
No ratings yet
Math435 HW 8
8 pages
Proof Wilks Theorem Likelihood Ratio Test
No ratings yet
Proof Wilks Theorem Likelihood Ratio Test
4 pages
Thinning Poisson Process
No ratings yet
Thinning Poisson Process
10 pages
Stat 245 Homework 3 Solution
No ratings yet
Stat 245 Homework 3 Solution
8 pages
Inf 2
No ratings yet
Inf 2
37 pages
8112 Notes
No ratings yet
8112 Notes
79 pages
Overview of As Convergence
No ratings yet
Overview of As Convergence
17 pages
Worksheet2 Solutions
No ratings yet
Worksheet2 Solutions
2 pages
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
No ratings yet
Maximum Likelihood Estimation.: N N I N I 1 N I I 1
5 pages
Hw2 - Raymond Von Mizener - Chirag Mahapatra
No ratings yet
Hw2 - Raymond Von Mizener - Chirag Mahapatra
13 pages
SampleQs Solutions PDF
No ratings yet
SampleQs Solutions PDF
20 pages
Risk Fisher
No ratings yet
Risk Fisher
39 pages
Maximum Likelihood Estimation (MLE)
No ratings yet
Maximum Likelihood Estimation (MLE)
4 pages
Ferguson3
No ratings yet
Ferguson3
1 page
Stat210b Lecture 9
No ratings yet
Stat210b Lecture 9
6 pages
Tutorial 5 So LN
No ratings yet
Tutorial 5 So LN
10 pages
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
No ratings yet
Conditional Least Squares Estimation in Nonlinear and Nonstationary Stochastic Regression Models
27 pages
STAT 135 Solutions To Homework 3:: 30 Points
No ratings yet
STAT 135 Solutions To Homework 3:: 30 Points
8 pages
Filt Ident Lecturenotes
No ratings yet
Filt Ident Lecturenotes
12 pages
Stat520 Ch.5
No ratings yet
Stat520 Ch.5
5 pages
Lecture1
No ratings yet
Lecture1
8 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
The Royal Statistical Society 2003 Examinations: Solutions
No ratings yet
The Royal Statistical Society 2003 Examinations: Solutions
9 pages
Fundamentals of Statistics (18.6501x)
No ratings yet
Fundamentals of Statistics (18.6501x)
20 pages
MAST20005 Statistics Assignment 1
No ratings yet
MAST20005 Statistics Assignment 1
10 pages
HW2 550
No ratings yet
HW2 550
8 pages
Stat 2013
No ratings yet
Stat 2013
132 pages
Appunti
No ratings yet
Appunti
34 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
14.384 Time Series Analysis: Mit Opencourseware
No ratings yet
14.384 Time Series Analysis: Mit Opencourseware
6 pages
MIT14 30s09 Lec19
No ratings yet
MIT14 30s09 Lec19
7 pages
sta255 Week 11-2 pre
No ratings yet
sta255 Week 11-2 pre
21 pages
Industrial Mathematics Institute: Research Report
No ratings yet
Industrial Mathematics Institute: Research Report
25 pages
11 Hidden Markov Models (HMMS) Model and Problem Description
No ratings yet
11 Hidden Markov Models (HMMS) Model and Problem Description
15 pages
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
No ratings yet
STAT732: Solutions For Homework 2: Due: Wednesday, Feb 14
7 pages
Mathematical Statistics (II)
No ratings yet
Mathematical Statistics (II)
112 pages
Empirical Process (Sara Van de Geer)
No ratings yet
Empirical Process (Sara Van de Geer)
91 pages
1950 Theil - A Rank-Invariant Method of Linear and Polynomial Regression Analysis
No ratings yet
1950 Theil - A Rank-Invariant Method of Linear and Polynomial Regression Analysis
16 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Directed Hypergraphs: Problems, Algorithmic Results, and A Novel Decremental Approach
No ratings yet
Directed Hypergraphs: Problems, Algorithmic Results, and A Novel Decremental Approach
18 pages
Courses of Interest
No ratings yet
Courses of Interest
2 pages
Nested Logit Models
No ratings yet
Nested Logit Models
3 pages
Going Green: The Effect of Green Labels On Delivery Time Slot Choices
No ratings yet
Going Green: The Effect of Green Labels On Delivery Time Slot Choices
41 pages
Tuition Fee 2013
No ratings yet
Tuition Fee 2013
10 pages
Gauss Quadrature Integration
100% (1)
Gauss Quadrature Integration
30 pages
Personal Questions
No ratings yet
Personal Questions
2 pages
Indian Standard: Code of Practice For Design and Construction of Raft Foundations
No ratings yet
Indian Standard: Code of Practice For Design and Construction of Raft Foundations
3 pages
Drag Lines
No ratings yet
Drag Lines
31 pages
How Time Table Works
No ratings yet
How Time Table Works
1 page
ATTENTION:All UG/PG/Ph.D. Students: R24.7 Self-Study Course
No ratings yet
ATTENTION:All UG/PG/Ph.D. Students: R24.7 Self-Study Course
1 page
Manual
No ratings yet
Manual
149 pages
E Bill Desk Action
No ratings yet
E Bill Desk Action
1 page
The Temporal Knapsack Problem and Its Solution
No ratings yet
The Temporal Knapsack Problem and Its Solution
15 pages
Number Systems
No ratings yet
Number Systems
2 pages
Paper 5 Nov 2007
No ratings yet
Paper 5 Nov 2007
24 pages
TCS DEC - 18 Solved
No ratings yet
TCS DEC - 18 Solved
11 pages
Syllabus Mat234
No ratings yet
Syllabus Mat234
5 pages
Digital Signal Processing Lab Manual Using Matlab
No ratings yet
Digital Signal Processing Lab Manual Using Matlab
33 pages
GeoGebra in 10 Lessons - Gerrit Stols
No ratings yet
GeoGebra in 10 Lessons - Gerrit Stols
39 pages
Math 2011-Introduction To Multivariable Calculus (Edited by Dr. Hon-Ming HO) Practice Exercises 10: Gradient Vectors and Directional Derivatives
No ratings yet
Math 2011-Introduction To Multivariable Calculus (Edited by Dr. Hon-Ming HO) Practice Exercises 10: Gradient Vectors and Directional Derivatives
5 pages
Metamodel Based Multidisciplinary Design Optimization For Automotive
No ratings yet
Metamodel Based Multidisciplinary Design Optimization For Automotive
132 pages
Measures of Variability PDF
No ratings yet
Measures of Variability PDF
39 pages
Assignment6 With Solutions
No ratings yet
Assignment6 With Solutions
4 pages
What Is The RANK Function
No ratings yet
What Is The RANK Function
6 pages
Cps170 Logic
No ratings yet
Cps170 Logic
29 pages
2 Motion Of A Simple Pendulum: θ mg tangent L
No ratings yet
2 Motion Of A Simple Pendulum: θ mg tangent L
8 pages
WLP Week 6
No ratings yet
WLP Week 6
7 pages
MCQ Iii PDF
100% (1)
MCQ Iii PDF
11 pages
Axial Piston Pump
No ratings yet
Axial Piston Pump
94 pages
Math 7 Third Quarterly Assessment
No ratings yet
Math 7 Third Quarterly Assessment
4 pages
Full Download Quantitative Human Physiology Second Edition An Introduction Joseph J Feher Ph.D. Cornell University PDF
100% (1)
Full Download Quantitative Human Physiology Second Edition An Introduction Joseph J Feher Ph.D. Cornell University PDF
53 pages
Wa0008
No ratings yet
Wa0008
30 pages
Cook's Theorem On NP-completeness SATISFIABILITY Is NP-complete
No ratings yet
Cook's Theorem On NP-completeness SATISFIABILITY Is NP-complete
1 page
Summative Test No. 5 FINAL
No ratings yet
Summative Test No. 5 FINAL
4 pages
Computer Practicals Ashmeet
No ratings yet
Computer Practicals Ashmeet
57 pages
2009 F4 Firstterm Math2
No ratings yet
2009 F4 Firstterm Math2
6 pages
Number System PDF
No ratings yet
Number System PDF
10 pages
Math 8 First Quarterly Tos 2022 2023
No ratings yet
Math 8 First Quarterly Tos 2022 2023
4 pages
2.3 Negative Numbers - Multiply and Divide
No ratings yet
2.3 Negative Numbers - Multiply and Divide
45 pages
Grade 6 Equivalent Ratio in
No ratings yet
Grade 6 Equivalent Ratio in
8 pages
Super Hyper Dominating and Super Hyper Resolving On Neutrosophic Super Hyper Graphs and Their Directions in Game Theory and Neutrosophic Super Hyper Classes
No ratings yet
Super Hyper Dominating and Super Hyper Resolving On Neutrosophic Super Hyper Graphs and Their Directions in Game Theory and Neutrosophic Super Hyper Classes
22 pages

Profile Likelihood Method

Uploaded by

Profile Likelihood Method

Uploaded by

Chapter 3

The Profile Likelihood

3.1 The Profile Likelihood

3.1.1 The method of profiling

ˆn = arg max L ( ˆ ) = arg max Ln ( , ˆ ).

✓ˆ↵ = arg max L↵ (✓)

Remark 3.1.1 (Inverse of a block matrix) Suppose that

is a square matrix. Then

Using (3.2) we have

Example 3.1.2 (Block diagonal information matrix) If

then using (3.3) we have

@Ln ( , ) @Ln ( , ) @Ln ( , ) 1

where I is defined as in (3.1) and

Next we make a decomposition of ( ˆ 0 0 ). We recall that since Ln ( 0,

We recall that the regular score function satisfies

where ✓ˆn0 = ( ˆ, ˆ ) (the mle). We now find an approximation of ( ˆ 0 0)

variance of n( ˆ 0 ) is (I I I ,1 I , ) 1 . Therefore if and are scalars

(ii) Look again at the expression

@Ln ( , ) @Ln ( , ) 1 @Ln ( 0 , )

3.1.3 The log-likelihood ratio statistics in the presence of nui-

It is clear that Ln (µ, b2 (µ)) is maximised at µ

Thus the log-likelihood ratio is

where C↵ is an appropriately chosen critical value. We recall that tn (µ) is a t-distribution

all results you use.

same is not true of @Ln ( , )

showed in (3.6), where we showed that under the null

where 0 = arg max Ln ( 0, ).

3.2.1 An application of profiling to frequency estimation

Xt = A cos(!t) + B sin(!t) + "i

(iii) Obtain the profile likelihood of !.

(iv) By using the identity

Comment on the rate of convergence of !

Useful information: The following quantities may be useful:

2Ln (A, B, !) = Sn (A, B, !) + O(1).

which is easily evaluated (using a basic grid search).

Now taking expectations of the above and using (v) we have

Exercise 3.2 Run a simulation study to illustrate the above example.

3.2.2 An application of profiling in survival analysis

P (Ti > t) = Fi (t) t 0.

(i) Derive the log-likelihood of {(Yi , i )}.

(i) The survivial function and the density are

the log-likelihood of ( , ✓) based on (Yi , i ) is

which is not simple to solve.

Hence to obtain the ML estimator of we maximise the above with respect to ,

3.2.3 An application of profiling in semi-parametric regression

The Nadaraya-Watson estimator

where we replace fY,U and fU with

The Nadaraya-Watson estimator is a non-parametric estimator and su↵ers from a far

Estimating using the Nadaraya-Watson estimator and profiling

Therefore, the least squares estimator of is

You might also like