Can I avoid sampling irrelevant bits of the distribution?

I discovered pymc3 last month and started playing with it to fit some simple models. For instance, imagine I have:

\alpha, \beta \sim \mathrm{Exp}(-1/10)
p_i \sim \mathrm{Beta}(\alpha, \beta)
N_i \sim \mathrm{Binom}(ss_i, p_i) (observed data)
M_i \sim \mathrm{Binom}(ps_i, p_i)

where i runs from 1\dots N. There could be a story associated with it, such as there are N hospitals with 100 patients each and in each hospital we ask 10 patients if they are happy. We would be interested in statistics of N_1 + N_2 + M_1 + M_2 because we want to know how many people are happy in all hospitals, or we want to see if we can be more than 90% certain that the overall satisfaction rate is higher than some number.

I found it really easy to build this model with PyMc3 and sample from the posterior. I will also generate some data first to have a working example

np.random.seed(17610407)
sample_sizes = [10, 20, 30, 40]
n = len(sample_sizes)
population_sizes = [100] * n

alpha_, beta_ = pm.Exponential.dist(0.1).random(size=2)
p_ = pm.Beta.dist(alpha_, beta_).random(size=n)
N_ = pm.Binomial.dist(sample_sizes, p_).random()

with pm.Model() as demo:
    alpha = pm.Exponential('alpha', 0.1)
    beta = pm.Exponential('beta', 0.1)
    p = pm.Beta('p', alpha, beta, shape=n)
    N = pm.Binomial('N', sample_sizes, p, observed=N_)
    M = pm.Binomial('M', population_sizes, p, shape=n) # ?
    
pm.model_to_graphviz(demo)

image

But unfortunately, this will sometimes run into problems. I think it is because we are mixing discrete M_i distributions with continuous p_i distributions. Namely if I sample from the posterior …

with demo:
    trace = pm.sample(10000, chains=5, tune=2000, cores=5)
pm.summary(trace)

… this gives me very few effective samples for the p’s: around 200. It also complains about many divergences, although I am not sure if these are a problem. However if I remove the M node from the diagram and do everything again, I get about 40k-50k effective samples and far fewer divergences.

So it seems to me that I can get better convergences by first removing M, and sampling from the joint distribution over \alpha, \beta and p. Afterwards, I can compute M for each sample by drawing from the binomial distribution.

  • is this a valid approach?
  • is this ‘trick’ something pymc3 can do for me if I specify some options to the sampler?

Yes, and it is usually a better approach - NUTS would interact weirdly with discrete samplers sometimes.

One trick is to run samples, and then create a new node for posterior predictive sampling:

with pm.Model() as demo:
    alpha = pm.Exponential('alpha', 0.1)
    beta = pm.Exponential('beta', 0.1)
    p = pm.Beta('p', alpha, beta, shape=n)
    N = pm.Binomial('N', sample_sizes, p, observed=N_)
    trace = pm.sample()
    
with demo:
    M = pm.Binomial('M', population_sizes, p, shape=n)
    posterior_predictive = pm.sample_posterior_predictive(trace, vars=[M])
1 Like