Hello, I’m trying to implement a custom Gibbs sampler in PyMC3. I can’t figure out a way to specify my sampler that’s simple and idiomatic and I’m wondering if I’m missing the right way to do it. Seems like Gibbs sampling isn’t what PyMC is designed for so maybe that’s it.
Below is some code I wrote without PyMC that implements a Gibbs sampler for the posterior of population genetics parameters f
and r
given observations of organisms with different genotypes (AA
, Aa
or aa
). Basically you sample a latent variable Z
(whether or not an observation is inbred) conditioned on f
and r
and then you sample f
and r
conditioned on Z
by taking advantage of the Beta-Binomial conjugacy:
import numpy as np
np.random.seed(123)
def gibbs_hw(niters, data,
prior_params=[1,1,1,1],
initial_values = {'f': 0.5, 'r': 0.5}
):
# Turn counts into list of strings
obs = np.array(['AA']*data[0] + ['Aa']*data[1] + ['aa']*data[2])
f = np.zeros((niters))
r = np.zeros_like(f)
f[0] = initial_values['f']
r[0] = initial_values['r']
for i in range(1, niters):
# Z_i is whether the ith observation is inbred or not.
# g_i is the genotype of the ith individual
# Calculate p(Z|f,r) for each case of g_i:
zi_prob_map = {
'AA': f[i-1] * r[i-1] / (f[i-1] * r[i-1] + (1 - f[i-1]) * r[i-1] ** 2),
'Aa': 0,
'aa': f[i-1] * (1 - r[i-1]) / (f[i-1] * (1 - r[i-1]) + (1 - f[i-1]) * (1 - r[i-1]) ** 2)
}
z_probs = np.array([zi_prob_map[key] for key in obs])
z = np.random.uniform(size = z_probs.size) < z_probs
n_ibd = z.sum()
n_not_ibd = (~z).sum()
f[i] = np.random.beta(n_ibd + prior_params[0], n_not_ibd + prior_params[1], size=1)
# Get counts of genotypes given NOT inbred.
types, not_idb_type_counts = np.unique(obs[~z], return_counts=True)
not_ibd_counts = defaultdict(lambda :0, zip(types, not_idb_type_counts))
nz_A = 2 * not_ibd_counts["AA"] + not_ibd_counts["Aa"]
nz_a = 2 * not_ibd_counts["aa"] + not_ibd_counts["Aa"]
# Get counts of genotypes given inbred.
types, idb_type_counts = np.unique(obs[z], return_counts=True)
ibd_counts = defaultdict(lambda :0, zip(types, idb_type_counts))
z_A = ibd_counts["AA"]
z_a = ibd_counts["aa"]
r[i] = np.random.beta(prior_params[2] + nz_A + z_A, prior_params[3] + nz_a + z_a, size=1)
return{
'f': f,
'r': r
}
out = gibbs_hw(niters=10000, data=(50, 21, 29))
plt.hist2d(out['f'], out['r'], bins=75)
plt.show()
I would like to implement this in PyMC3 but I can’t figure out how. This posterior can be sampled in a different way using Metropolis or NUTS like so:
counts=np.array([50, 21, 29])
data_enum = {
0: 'AA',
1: 'Aa',
2: 'aa'
}
data = np.array([0]*counts[0] + [1]*counts[1] + [2]*counts[2])
with pm.Model() as hardy_weinberg:
f = pm.Beta('f', alpha=1, beta=1) # Uniform
r = pm.Beta('r', alpha=1, beta=1)
param1 = f*r+(1-r)*(r**2)
param2 = 2*(1-f)*r*(1-r)
param3 = f*(1-r)+(1-f)*(1-r)
genotype = pm.Categorical('genotype', p=pm.math.stack(param1, param2, param3),
observed=data)
So that’s pretty cool. But it’s not obvious to me how I’m supposed to implement a Gibbs sampler where the graph is cyclic. As in Z
is dependent on f,r
and f,r
are dependent on Z
. I’m getting the impression that this isn’t one of the main use cases of PyMC is designed for.
I found this page on making custom step
classes: Using a custom step method for sampling from locally conjugate posterior distributions — PyMC documentation
But looking at the code, it looks like this just amounts to me writing it from scratch again using the numpy random samplers. Am I missing something with the available step classes?
PS: I love this community and this package! PyMC is sick. I hope I can contribute a good example or tutorial for other beginners.