How is the marginalized likelihood computed?

Computing the marginal likelihood (aka evidence)

p(y | \mathcal{D})

is not straight forward (see https://arxiv.org/pdf/2005.08334.pdf for a review). What is the method used by pymc3? Where could I find a pointer to a paper or other resource?

Thanks in advance!

Best,
Jannes

I have this super dated post Motif of the Mind | Junpeng Lao, PhD (but the idea still stand). Otherwise if you sample with SMC you get marginal_likelihiood as byproduct.

Thanks @junpenglao, that blog’s been really insightful because it shows several ways of computing the marginalised likelihood with pymc3.

I also (finally) understand qualitatively why SMC yields the marginal likelihood as a byproduct:

It samples from a sequence of unnormalised functions \mathcal{L}^{(i)}(\theta) that gradually transform from prior to posterior via a temperature parameter \kappa. Here in log space:

\mathcal{L}^{(i)}(\theta) \propto \kappa^{(i)} \log p(\theta | \mathcal{D}) + (1 - \kappa^{(i)}) \log p(\theta)

Thus, samples from \kappa^{(0)} = 0 can be used to evaluate the marginalised likelihood (although the yield will generally not be very good because many samples will be sitting in low-likelihood regions).

(Please let me know if I misunderstood the argument).

I am a bit fuzzy of the detail as well, but usually I understand from the perspective that it is like an annealed importance sampling, SMC interpolates from prior to the posterior and accumulating importance weights along the way. The product of these importance weights gives an unbiased estimate of the normalizing constants of the posterior (marginal likelihood)

Makes sense, thanks for the pointers!