\(\newcommand{\B}[1]{ {\bf #1} }\) \(\newcommand{\R}[1]{ {\rm #1} }\) \(\newcommand{\W}[1]{ \; #1 \; }\)
theory¶
View page source\(\newcommand{\dtheta}[1]{ \frac{\R{d}}{\R{d} \theta_{ #1}} }\)
Laplace Approximation for Mixed Effects Models¶
Reference¶
TMB: Automatic Differentiation and Laplace Approximation, Kasper Kristensen, Anders Nielsen, Casper W. Berg, Hans Skaug, Bradley M. Bell, Journal of Statistical Software 70, 1-21 April 2016.
Total Likelihood¶
The reference above defines \(f( \theta, u)\) to be the negative log-likelihood of the \(z\), \(y\), \(u\) and \(\theta\); i.e.,
Random Likelihood, f(theta, u)¶
We use \(f( \theta , u )\) for the part of the likelihood that depends on the random effects \(u\);
Assumption¶
The function \(f(\theta, u)\) is assumed to be smooth. Furthermore, there are no constraints on the value of \(u\).
Random Effects Objective¶
Give a value for the fixed effects \(\theta\), the random effects objective is the random likelihood as just a function of the random effects; i.e., \(f( \theta , \cdot )\).
Fixed Likelihood, g(theta)¶
We use \(g( \theta )\) for the part of the likelihood that only depends on the fixed effects \(\theta\);
The function \(g( \theta )\) may not be smooth, to be specific, it can have absolute values in it (corresponding to the Laplace densities). Furthermore, there may be constraints on the value of \(\theta\).
Optimal Random Effects, u^(theta)¶
Given the fixed effects \(\theta\), we use \(\hat{u} ( \theta )\) to denote the random effects that maximize the random likelihood; i.e.,
Note that this definition agrees with the other definition for u^(theta) .
Objective¶
Laplace Approximation, h(theta, u)¶
Using the notation above, the Laplace approximation as a function of both the fixed and random effects is
where \(n\) is the number of random effects.
Laplace Objective, r(theta)¶
We refer to
as the Laplace objective. This corresponds to equation (4) in the Reference .
Fixed Effects Objective, L(theta)¶
The fixed effects objective, as a function of just the fixed effects, is
Derivative of Optimal Random Effects¶
Because \(f(\theta, u)\) is smooth, and \(\hat{u} ( \theta )\) is optimal w.r.t \(u\), we obtain
From this equation, and the implicit function theorem, it follows that
Derivative of Random Constraints¶
The derivative of the random constraint function is given by
Derivative of Laplace Objective¶
The derivative of the random part of the objective is given by
Thus the derivative of \(r ( \theta )\) can be computed using the derivative of \(\hat{u} ( \theta )\) and the partials of \(h( \theta , u )\). Let \(\partial_k\) denote the partial with respect to the k-th component of the combined vector \(( \theta , u )\).
where \(n\) is the number of random effects. Note that \(f_{u,u} ( \theta , u )\) is often sparse and only non-zero components need be included in the summation. This is discussed in more detail near equation (8) in the Reference . We also note that if \(k\) corresponds to a component of \(u\) then
Approximate Optimal Random Effects¶
First Order, U(beta, theta, u)¶
We define the function
It follows that
Second Order, W(beta, theta, u)¶
We define the function
It follows that
and for random effects indices \(i\),
Approximate Laplace Objective, H(beta, theta, u)¶
Given these facts we define
It follow that
Approximate Random Constraint Function, B(beta, theta, u)¶
We also define the approximation random constraint function
Hessian of Laplace Objective¶
Note that the Hessian of the Laplace objective \(r_{\theta,\theta} ( \theta )\) is required when quasi_fixed is false. In this case, the representation
is used to compute this Hessian.
Hessian of Random Constraints¶
In the case where quasi_fixed is false we need to compute second derivatives of the random constraint function. We use \(A^i\) ( \(B^i\)) to denote one of the rows of the random constraint matrix ( approximate random constraint function ). The Hessian of the random constraints can be computed using the formula
Sparse Observed Information¶
Suppose that \(H\) is a sparse positive definite Hessian of a likelihood at the maximum likelihood estimate for its unknown parameters. The corresponding asymptotic covariance for posterior distribution of the parameters is normal with covariance \(H^{-1}\). A vector \(v\) with this covariance can be simulated as
where \(R\) is defined by \(H^{-1} = R R^\R{T}\) and \(w\) is a normal with mean zero and the identity covariance. Suppose we have a sparse factorization of the form
where \(L\) is lower triangular, \(D\) is diagonal, and \(P\) is a permutation matrix. It follows that
If \(w\) is simulated as a normal random vector with mean zero and identity covariance, and \(v\) is computed using this formula, the mean of \(v\) is zero and its covariance is given by