I discovered pymc3 last month and started playing with it to fit some simple models. For instance, imagine I have:
\alpha, \beta \sim \mathrm{Exp}(-1/10)
p_i \sim \mathrm{Beta}(\alpha, \beta)
N_i \sim \mathrm{Binom}(ss_i, p_i) (observed data)
M_i \sim \mathrm{Binom}(ps_i, p_i)
where i runs from 1\dots N. There could be a story associated with it, such as there are N hospitals with 100 patients each and in each hospital we ask 10 patients if they are happy. We would be interested in statistics of N_1 + N_2 + M_1 + M_2 because we want to know how many people are happy in all hospitals, or we want to see if we can be more than 90% certain that the overall satisfaction rate is higher than some number.
I found it really easy to build this model with PyMc3 and sample from the posterior. I will also generate some data first to have a working example
np.random.seed(17610407)
sample_sizes = [10, 20, 30, 40]
n = len(sample_sizes)
population_sizes = [100] * n
alpha_, beta_ = pm.Exponential.dist(0.1).random(size=2)
p_ = pm.Beta.dist(alpha_, beta_).random(size=n)
N_ = pm.Binomial.dist(sample_sizes, p_).random()
with pm.Model() as demo:
alpha = pm.Exponential('alpha', 0.1)
beta = pm.Exponential('beta', 0.1)
p = pm.Beta('p', alpha, beta, shape=n)
N = pm.Binomial('N', sample_sizes, p, observed=N_)
M = pm.Binomial('M', population_sizes, p, shape=n) # ?
pm.model_to_graphviz(demo)
But unfortunately, this will sometimes run into problems. I think it is because we are mixing discrete M_i distributions with continuous p_i distributions. Namely if I sample from the posterior …
with demo:
trace = pm.sample(10000, chains=5, tune=2000, cores=5)
pm.summary(trace)
… this gives me very few effective samples for the p’s: around 200. It also complains about many divergences, although I am not sure if these are a problem. However if I remove the M node from the diagram and do everything again, I get about 40k-50k effective samples and far fewer divergences.
So it seems to me that I can get better convergences by first removing M, and sampling from the joint distribution over \alpha, \beta and p. Afterwards, I can compute M for each sample by drawing from the binomial distribution.
- is this a valid approach?
- is this ‘trick’ something pymc3 can do for me if I specify some options to the sampler?