# Samples from Conditional Posterior Distribution

Let us consider the following Hierarchical Bayesian model:

• w \sim\ Beta(20, 20)
• K = 6
• a = w * (K - 2) + 1
• b = (1 - w) * (K - 2) + 1
• theta \sim\ Beta(a, b)
• y \sim\ Bern(theta)

The above is the example of figure 9.3 in the book Doing Bayesian Data Analysis by John Kruschke.

I use the pymc3 to construct the model as follows:

import numpy as np
import pymc3 as pm
import arviz as az

K = 6
D = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0])

with pm.Model() as mdl:

w = pm.Beta('w', alpha=20., beta =20.)
a = w * (K - 2) + 1
b = (1 - w) * (K - 2) + 1
theta = pm.Beta('theta', alpha=a, beta=b)
y = pm.Bernoulli('y', p=theta, observed=D)
trace = pm.sample(10000, tune=3000)


My question is the following:

• How can I compute the conditional posterior probability p(theta|w=0.25,D)? In other words how can I obtain samples from the conditional posterior distribution of theta|w=0.25, D?

Can you just set w=0.25?

You can indeed just set w=0.25 and re-sample the model. In this case the model is simple and sampling is cheap, so it’s probably the way to go. I just wanted to add that if the model were expensive to sample, you could also get conditional probabilities from the joint posterior using boolean masks. The only wrinkle is that, since w is a continuous variable, you’re not going to have any samples at exactly w=0.25 from which to construct the conditional distribution. To get around this, I binned the values of w into chunks of width 0.1, so I built p(\theta | w \in [0.2, 0.3), D):

bins = np.linspace(0.2, 0.8, 7)
axes = az.plot_pair(trace, var_names=['w', 'theta'], marginals=True)
main_plot = axes
ymin, ymax = main_plot.get_ylim()
main_plot.vlines(bins, ymin=ymin, ymax=ymax, ls='--', color='k', zorder=3)


The conditional probability you are interested in is the cluster of points in the first bin, between 0.2 and 0.3. It’s pretty far out on the tail, so there isn’t much to go on (hence the wider bin size).

You can grab values of theta in that bin, and then sample from these points as your conditional posterior:

w_bins = np.digitize(post['w'], bins=bins)
theta_given_w25 = post['theta'][w_bins == 1]


For comparison, this “tabular” method gives the conditional posterior mean of theta as 0.6141472 and std of 0.10960547, while resampling the model with w=0.25 gives the mean as 0.611 and std of 0.113, so you can see they are roughly equivalent.

1 Like

Thank you very much for the great answer.

I fully understand the second way using boolean masks. Actually that is what I thought at first. The problem is if I wanted the conditional probability at w = 0.10 where I haven’t any samples?

Also, by re-sample the model do you mean to re-run the full code by setting w = 0.25?

The conditional probability at w = 0.1 might just be 0. Thems the breaks.

Exactly, replace w = pm.Beta('w', alpha=20., beta =20.) with w = 0.25. The sampler started complaining about convergence when I did it, which is normal because we know we’re in a very flat region of the likelihood space (i.e. there’s not much information out there for the sampler to go on). I imagine if you put in w=0.1 it will crash and burn. But it’s worth a try!

Thank you very much.

When I run the sampler with w = 0.10 and computed the average of the samples actually it returned 0.5779 which is very very close to the value computed from grid approximation (0.5778) .

1 Like