Hello, I am new to PyMC3. After studying the tutorial, my understanding is that, in the model specification, we usually only need to setup prior information, and model equations, like regression formula, and we can just use selected MCMC algorithms, like HMC to fit the model and get the posterial distributions. We do not need to write down complicated conditional distributions, or Gibbs sampling scheme as we usually did in classical Bayesian analysis. Is this understanding correct?
that’s exactly right!
Hi bwengals, thanks for the reply. Would you like to share any insight or reference material on why PyMC3 can avoid the necessities of asking modelers to derive those complex posterior distributions, which can be very challenging in the traditional Bayesian framework. For instance, in PyMC3, after finishing the model setting up, it can automatically invoke HMC (using Nuts algorithm) to fit the model, how to cover this magic gap?
The developer guide describes in detail how pymc3 currently works, though there will be some substantial changes with v4.
Others on the forum understand the internals much better than I do, but I’ll try my best to roughly describe how it works. To do Metropolis or HMC, you only need to have on hand a function that is proportional to the log probability of the posterior distribution
logp, so, the product of the likelihood and the priors p(y | \theta) p(\theta). Each distribution specified in a PyMC3 program adds its contribution to the total
logp. Take this example of estimating the mean of data assumed to be normally distributed:
import numpy as np import pymc3 as pm y = [1, 2, 3] # true theta is 2 with pm.Model() as model: theta = pm.Normal('theta', mu=0, sd=1) sd = 1.1 lik = pm.Normal('lik', mu=theta, sd=sd, observed=y)
Say we are running Metropolis, and we have a theta proposed, say, theta = 0.5. PyMC3 calculates
logp = \log N(0.5 | 0, 1) + \log N(1 | 0.5, 1.1) + \log N(2 | 0.5, 1.1) + \log N(3 | 0.5, 1.1) , and then the step can be accepted or rejected.
Under the hood, each distribution is added to a static computational graph using Theano, so PyMC3 ‘knows’ which distributions are priors and likelihoods (likelihoods have
observed set, priors don’t), and it can take gradients of
logp with respect to \theta for HMC. While you specify your PyMC3 program once, this graph/function is repeatedly run to evaluate
logp or its gradient at different \theta, or to draw samples. So the sum of \log N(...) is done from that graph, not with the model code that you write in the
with pm.Model() as model: block.