Hey, so there’s a couple of things at play here. Your question has INLA in the title which is a bit different to the vanilla Laplace approximation. I’ll answer the question based on the Laplace approximation and then I’ll explain the differences with INLA at the end.
Laplace approximation
The Laplace approximation approximates the posterior density by fitting a multivariate Gaussian (normal) to all the parameters with a mean vector equal to the MAP (mode, minimise the negative log-posterior density) estimate and a precision matrix (inverse covariance, inverted Hessian matrix of log posterior wrt params) evaluated at the mode.
Don’t worry too much about the details but as you are fitting a Gaussian, this becomes exact when the posterior is Gaussian and a good approximation unless you have multimodal (many MCMC methods also struggle with this) or highly skewed distributions.
Richard McElreath uses the Laplace approximation in his early parts of Statistical Rethinking and I’m sure he will explain it clearer than me. If you are worried about whether it is a good approximation for your model and whether it is worth using an approximation for gains in speed, try fitting both MCMC and the Laplace approximation. If the Laplace approximation produces a similar posterior to MCMC then happy days.
Integrated nested Laplace approximation (INLA)
With INLA, we approximate the marginal posterior distribution of some subset of the parameters, referred to as the marginal Laplace approximation.
Then, integrate out the remaining parameters using another method.
- Integrated. Using numerical integration
- Nested. Because we need p(θ∣y) to get p(u∣y)
- Laplace approximations. The methods used to obtain parameters for the Gaussian approximation.
So the difference is we only use Laplace approximation on some parameters, which we hope have a Gaussian posterior, and use some other, more flexible inference method (like numerical integration in the case of R-INLA) for the other parameters.
As in the R-INLA docs, INLA works well when the “full conditional density for the latent field to be near Gaussian”, i.e. the posterior density for the subset of the parameters we want to approximate with the Laplace is Gaussian.
PyMC work on INLA is currently paused (we need a pytensor optimiser and I need to sort my life out) but we are very, very open for contributions on this if people are interested in having this in PyMC. It’s not that far off and the issue tracker is here.
I have some notes on INLA with further reading at the bottom, I also recommend Adam Howes’ thesis chapter on this which builds up from these ideas from scratch and compares to MCMC.