Hello, I am new to PyMC3. After studying the tutorial, my understanding is that, in the model specification, we usually only need to setup prior information, and model equations, like regression formula, and we can just use selected MCMC algorithms, like HMC to fit the model and get the posterial distributions. We do not need to write down complicated conditional distributions, or Gibbs sampling scheme as we usually did in classical Bayesian analysis. Is this understanding correct?
that’s exactly right!
Hi bwengals, thanks for the reply. Would you like to share any insight or reference material on why PyMC3 can avoid the necessities of asking modelers to derive those complex posterior distributions, which can be very challenging in the traditional Bayesian framework. For instance, in PyMC3, after finishing the model setting up, it can automatically invoke HMC (using Nuts algorithm) to fit the model, how to cover this magic gap?
The developer guide describes in detail how pymc3 currently works, though there will be some substantial changes with v4.
Others on the forum understand the internals much better than I do, but I’ll try my best to roughly describe how it works. To do Metropolis or HMC, you only need to have on hand a function that is proportional to the log probability of the posterior distribution logp
, so, the product of the likelihood and the priors p(y | \theta) p(\theta). Each distribution specified in a PyMC3 program adds its contribution to the total logp
. Take this example of estimating the mean of data assumed to be normally distributed:
import numpy as np
import pymc3 as pm
y = [1, 2, 3] # true theta is 2
with pm.Model() as model:
theta = pm.Normal('theta', mu=0, sd=1)
sd = 1.1
lik = pm.Normal('lik', mu=theta, sd=sd, observed=y)
Say we are running Metropolis, and we have a theta proposed, say, theta = 0.5. PyMC3 calculates logp
= \log N(0.5 | 0, 1) + \log N(1 | 0.5, 1.1) + \log N(2 | 0.5, 1.1) + \log N(3 | 0.5, 1.1) , and then the step can be accepted or rejected.
Under the hood, each distribution is added to a static computational graph using Theano, so PyMC3 ‘knows’ which distributions are priors and likelihoods (likelihoods have observed
set, priors don’t), and it can take gradients of logp
with respect to \theta for HMC. While you specify your PyMC3 program once, this graph/function is repeatedly run to evaluate logp
or its gradient at different \theta, or to draw samples. So the sum of \log N(...) is done from that graph, not with the model code that you write in the with pm.Model() as model:
block.