Regarding setting up model for Bayesian analysis using PyMC3

wine_lover · September 23, 2021, 4:32am

Hello, I am new to PyMC3. After studying the tutorial, my understanding is that, in the model specification, we usually only need to setup prior information, and model equations, like regression formula, and we can just use selected MCMC algorithms, like HMC to fit the model and get the posterial distributions. We do not need to write down complicated conditional distributions, or Gibbs sampling scheme as we usually did in classical Bayesian analysis. Is this understanding correct?

bwengals · September 23, 2021, 5:45pm

that’s exactly right!

wine_lover · September 23, 2021, 8:56pm

Hi bwengals, thanks for the reply. Would you like to share any insight or reference material on why PyMC3 can avoid the necessities of asking modelers to derive those complex posterior distributions, which can be very challenging in the traditional Bayesian framework. For instance, in PyMC3, after finishing the model setting up, it can automatically invoke HMC (using Nuts algorithm) to fit the model, how to cover this magic gap?

bwengals · September 23, 2021, 10:59pm

The developer guide describes in detail how pymc3 currently works, though there will be some substantial changes with v4.

Others on the forum understand the internals much better than I do, but I’ll try my best to roughly describe how it works. To do Metropolis or HMC, you only need to have on hand a function that is proportional to the log probability of the posterior distribution logp, so, the product of the likelihood and the priors p(y | \theta) p(\theta). Each distribution specified in a PyMC3 program adds its contribution to the total logp. Take this example of estimating the mean of data assumed to be normally distributed:

import numpy as np
import pymc3 as pm

y = [1, 2, 3]  # true theta is 2

with pm.Model() as model:
    theta = pm.Normal('theta', mu=0, sd=1)
    sd = 1.1
    lik = pm.Normal('lik', mu=theta, sd=sd, observed=y)

Say we are running Metropolis, and we have a theta proposed, say, theta = 0.5. PyMC3 calculates logp = \log N(0.5 | 0, 1) + \log N(1 | 0.5, 1.1) + \log N(2 | 0.5, 1.1) + \log N(3 | 0.5, 1.1) , and then the step can be accepted or rejected.

Under the hood, each distribution is added to a static computational graph using Theano, so PyMC3 ‘knows’ which distributions are priors and likelihoods (likelihoods have observed set, priors don’t), and it can take gradients of logp with respect to \theta for HMC. While you specify your PyMC3 program once, this graph/function is repeatedly run to evaluate logp or its gradient at different \theta, or to draw samples. So the sum of \log N(...) is done from that graph, not with the model code that you write in the with pm.Model() as model: block.

Topic		Replies	Views
Is there any material gently explain how PYMC3 class/function works Questions	2	637	June 29, 2019
Example of computing posterior from prior and likelihood under the hood version agnostic modeling	3	1419	May 11, 2022
Pymc3 gives shifted result compared to analytical bayes solution v3	4	384	June 19, 2023
Sample external model parameters with PyMC3 Questions	4	972	October 17, 2017
How does PYMC sample a univariate distribution with no observed values? Questions doc	3	819	November 30, 2020

Regarding setting up model for Bayesian analysis using PyMC3

Related topics