theory¶

\(\newcommand{\dtheta}[1]{ \frac{\R{d}}{\R{d} \theta_{ #1}} }\)

Laplace Approximation for Mixed Effects Models¶

Reference¶

TMB: Automatic Differentiation and Laplace Approximation, Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell, Journal of Statistical Software 70, 1-21 April 2016.

Total Likelihood¶

The reference above defines \(f( \theta, u)\) to be the negative log-likelihood of the \(z\), \(y\), \(u\) and \(\theta\); i.e.,

\[- \log [ \; \B{p} ( y | \theta, u ) \B{p} ( u | \theta ) \; \B{p} ( z | \theta )\B{p} ( \theta ) \; ]\]

Random Likelihood, f(theta, u)¶

We use \(f( \theta , u )\) for the part of the likelihood that depends on the random effects \(u\);

\[f( \theta, u ) = - \log [ \B{p} ( y | \theta, u ) \B{p} ( u | \theta ) ]\]

Assumption¶

The function \(f(\theta, u)\) is assumed to be smooth. Furthermore, there are no constraints on the value of \(u\).

Random Effects Objective¶

Give a value for the fixed effects \(\theta\), the random effects objective is the random likelihood as just a function of the random effects; i.e., \(f( \theta , \cdot )\).

Fixed Likelihood, g(theta)¶

We use \(g( \theta )\) for the part of the likelihood that only depends on the fixed effects \(\theta\);

\[g( \theta ) = - \log [ \B{p} ( z | \theta ) \B{p} ( \theta ) ]\]

The function \(g( \theta )\) may not be smooth, to be specific, it can have absolute values in it (corresponding to the Laplace densities). Furthermore, there may be constraints on the value of \(\theta\).

Optimal Random Effects, u^(theta)¶

Given the fixed effects \(\theta\), we use \(\hat{u} ( \theta )\) to denote the random effects that maximize the random likelihood; i.e.,

\[\hat{u} ( \theta ) = \R{argmin} \; f( \theta, u ) \; \R{w.r.t.} \; u\]

Note that this definition agrees with the other definition for u^(theta) .

Objective¶

Laplace Approximation, h(theta, u)¶

Using the notation above, the Laplace approximation as a function of both the fixed and random effects is

\[h( \theta, u ) = + \frac{1}{2} \log \det f_{u,u} ( \theta, u ) + f( \theta, u ) - \frac{n}{2} \log ( 2 \pi )\]

where \(n\) is the number of random effects.

Laplace Objective, r(theta)¶

We refer to

\[r( \theta ) = h[ \theta , \hat{u} ( \theta ) ] \approx - \log \left[ \int_{-\infty}^{+\infty} \B{p} ( y | \theta, u ) \B{p} ( u | \theta ) \; \B{d} u \right]\]

as the Laplace objective. This corresponds to equation (4) in the Reference .

Fixed Effects Objective, L(theta)¶

The fixed effects objective, as a function of just the fixed effects, is

\[L ( \theta ) = r( \theta ) + g( \theta )\]

Derivative of Optimal Random Effects¶

Because \(f(\theta, u)\) is smooth, and \(\hat{u} ( \theta )\) is optimal w.r.t \(u\), we obtain

\[f_u [ \theta , \hat{u} ( \theta ) ] = 0\]

From this equation, and the implicit function theorem, it follows that

\[\hat{u}_\theta ( \theta ) = - f_{u,u} \left[ \theta , \hat{u} ( \theta ) \right]^{-1} f_{u,\theta} \left[ \theta , \hat{u} ( \theta ) \right]\]

Derivative of Random Constraints¶

The derivative of the random constraint function is given by

\[\partial_\theta [ A \; \hat{u} ( \theta ) ] = A \; \hat{u}_\theta ( \theta )\]

Derivative of Laplace Objective¶

The derivative of the random part of the objective is given by

\[r_\theta ( \theta ) = h_\theta [ \theta , \hat{u} ( \theta ) ] + h_u [ \theta , \hat{u} ( \theta ) ] \hat{u}_\theta ( \theta )\]

Thus the derivative of \(r ( \theta )\) can be computed using the derivative of \(\hat{u} ( \theta )\) and the partials of \(h( \theta , u )\). Let \(\partial_k\) denote the partial with respect to the k-th component of the combined vector \(( \theta , u )\).

\[\partial_k [ h( \theta , u ) ] = \partial_k [ f( \theta , u ) ] + \frac{1}{2} \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} f_{u,u} ( \theta , u )_{i,j}^{-1} \partial_k [ f_{u,u} ( \theta , u)_{i,j} ]\]

where \(n\) is the number of random effects. Note that \(f_{u,u} ( \theta , u )\) is often sparse and only non-zero components need be included in the summation. This is discussed in more detail near equation (8) in the Reference . We also note that if \(k\) corresponds to a component of \(u\) then

\[\partial_k ( f[ \theta , \hat{u} ( \theta ) ] ) = 0\]

Approximate Optimal Random Effects¶

First Order, U(beta, theta, u)¶

We define the function

\[U ( \beta , \theta , u ) = u - f_{u,u} ( \theta , u )^{-1} f_u ( \beta , u )\]

It follows that

\[U \left[ \theta , \theta , \hat{u} ( \theta ) \right] = \hat{u} ( \theta )\]

\[U_{\beta} [ \theta , \theta , \hat{u} ( \theta ) ] = \hat{u}_\theta ( \theta )\]

Second Order, W(beta, theta, u)¶

We define the function

\[W ( \beta , \theta , u ) = U( \beta , \theta , u ) - f_{u,u} ( \theta , u )^{-1} f_u [ \beta , U( \beta , \theta , u) ]\]

It follows that

\[W \left[ \theta , \theta , \hat{u} ( \theta ) \right] = \hat{u} ( \theta )\]

\[W_{\beta} [ \theta , \theta , \hat{u} ( \theta ) ] = \hat{u}_\theta ( \theta )\]

and for random effects indices \(i\),

\[W^i_{\beta \beta} [ \theta , \theta , \hat{u} ( \theta ) ] = \hat{u}^i_{\theta , \theta} ( \theta )\]

Approximate Laplace Objective, H(beta, theta, u)¶

Given these facts we define

\[H( \beta , \theta , u) = + \frac{1}{2} \log \det f_{u,u} [ \beta, W( \beta , \theta , u) ] + f[ \beta, U( \beta , \theta , u) ] - \frac{n}{2} \log ( 2 \pi )\]

It follow that

\[r_{\theta,\theta} ( \theta ) = H_{\beta,\beta} \left[ \theta , \theta , \hat{u} ( \theta ) \right]\]

Approximate Random Constraint Function, B(beta, theta, u)¶

We also define the approximation random constraint function

\[B( \beta , \theta , u) = A \; W( \beta , \theta , u )\]

Hessian of Laplace Objective¶

Note that the Hessian of the Laplace objective \(r_{\theta,\theta} ( \theta )\) is required when quasi_fixed is false. In this case, the representation

\[r_{\theta,\theta} ( \theta ) = H_{\beta,\beta} \left[ \theta , \theta , \hat{u} ( \theta ) \right]\]

is used to compute this Hessian.

Hessian of Random Constraints¶

In the case where quasi_fixed is false we need to compute second derivatives of the random constraint function. We use \(A^i\) ( \(B^i\)) to denote one of the rows of the random constraint matrix ( approximate random constraint function ). The Hessian of the random constraints can be computed using the formula

\[\partial_\theta \partial_\theta [ A^i \; \hat{u} ( \theta ) ] = B^i_{\beta,\beta} \left[ \theta , \theta , \hat{u} ( \theta ) \right]\]

Sparse Observed Information¶

Suppose that \(H\) is a sparse positive definite Hessian of a likelihood at the maximum likelihood estimate for its unknown parameters. The corresponding asymptotic covariance for posterior distribution of the parameters is normal with covariance \(H^{-1}\). A vector \(v\) with this covariance can be simulated as

\[v = R w\]

where \(R\) is defined by \(H^{-1} = R R^\R{T}\) and \(w\) is a normal with mean zero and the identity covariance. Suppose we have a sparse factorization of the form

\[L D L^\R{T} = P H P^\R{T}\]

where \(L\) is lower triangular, \(D\) is diagonal, and \(P\) is a permutation matrix. It follows that

\[H = P^\R{T} L D L^\R{T} P\]

\[H^{-1} = P^\R{T} L^{-\R{T}} D^{-1} L^{-1} P\]

\[R = P^\R{T} L^{-\R{T}} D^{-1/2}\]

\[v = P^\R{T} L^{-\R{T}} D^{-1/2} w\]

If \(w\) is simulated as a normal random vector with mean zero and identity covariance, and \(v\) is computed using this formula, the mean of \(v\) is zero and its covariance is given by

\[\B{E}[ v v^\R{T} ] = H^{-1}\]