I’m trying to model a gamma distribution with a shape parameter that’s defined by the addition of shape parameters from two gamma distributed variables:
T ~ Gamma(a+b, lambda),
where a, b are the shape parameters and the rate parameter is identical in both gamma distributions. The example comes from Blitzstein & Hwang (2015), ch. 8.5 on beta-gamma connections. I have a variable similarly defined as the one in their example and wanted to model it using this formulation as well. Assuming I can estimate adequately the underlying parameters in my process, I tried to implement the probability model as below. I adopted the workflow from the prior and posterior predictive check from the examples page.
The variable:
transformed_dispersion
array([0.56464136, 0.544074 , 0.54767713, 0.96010642, 0.55094633,
0.54671792, 0.54519951, 0.54318711, 1.30211609, 0.54587553,
0.54579041, 0.54559012, 0.54449852, 0.54477824, 0.54341431,
0.89046267, 0.54415172, 0.54503546, 0.88186893, 0.54354371,
0.54484279, 0.78528687, 0.54671329, 0.54525302, 0.5444967 ,
0.54473892, 0.54320891, 0.76939255, 0.8488386 , 0.59831084,
1.61372099, 0.92526673, 0.54835539, 0.55301413, 0.55960888,
1.0766595 , 0.80705567, 0.86671976, 0.72298249, 0.76118831,
0.79509659, 0.54580728, 0.54514719, 1.00622038, 0.77159335,
0.85368028, 0.89595386, 0.83969437, 0.77304834, 0.9339557 ,
0.79701339, 0.54771662, 0.55438468, 0.88408173, 1.01462635,
0.54626592, 0.94273911, 1.12030071, 0.5485798 , 1.30943247,
0.54543916, 0.54190064, 1.32258268, 0.54568767, 0.91973358,
0.84410375, 0.5443534 , 0.54377197, 0.54519768, 0.54539762,
0.54488623, 0.54560625, 0.54454563, 1.52545962, 0.54577678,
0.54612133, 0.54560943, 0.54523644, 0.54546741, 2.73341528,
0.55144868, 0.54929153, 0.56087582, 1.17688901, 1.07010819,
0.85147739, 0.94147768, 0.54881994, 0.54744296, 1.35048325,
1.17996367, 0.55078861, 0.74361047, 1.0098548 , 0.55733456,
0.55530399, 0.55239452, 1.04952795, 0.89954851, 0.86357334,
0.5581642 , 1.04565779, 1.05150538, 1.04779526, 0.55450942,
0.94367635, 0.96010688, 0.71399139, 0.55458086, 0.54810976,
3.10544337, 0.55473681, 0.551803 , 0.55724666, 0.57173589,
0.56491214, 1.45542065, 0.98102387, 0.54792521, 0.5582697 ,
1.03444558, 1.44342268, 0.55523299, 0.5521674 , 0.55767641,
0.55915942, 1.30490541, 1.45154838, 1.87692049, 3.00020164,
1.25916979, 0.8893854 , 1.17207219, 1.28325117, 0.55889352,
1.02714597, 0.87394816, 0.55073515, 0.87837141, 1.03374477,
0.85152393, 0.94548098, 1.12847992, 1.00038457, 0.90305005,
0.91964807, 0.54994632, 0.55014778, 0.55077364, 0.55103734,
0.95452579, 0.55105353, 1.05731227, 0.55028822, 1.06512706,
0.98798462, 0.921823 , 1.00637416, 0.88531057])
The model implementation for priors:
with pm.Model() as dispersion_model:
shape_a = pm.Normal("shape_a", 1.0, 0.1)
shape_b = pm.Normal("shape_b", 1.7, 0.1)
alpha_a = pm.Gamma("alpha_a", alpha=shape_a, beta=0.5)
alpha_b = pm.Gamma("alpha_b", alpha=shape_b, beta=0.5)
alpha_ = pm.Deterministic("alpha_", alpha_a + alpha_b)
lambda_ = pm.Normal("lambda_", 0.5, 0.1)
pm.Gamma("dispersion", alpha=alpha_, beta=lambda_, observed=transformed_dispersion)
idata = pm.sample_prior_predictive(samples=150, random_seed=rng)
The summary of the priors.
I also plotted the PPC for priors (if this isn’t normal practice, lmk why).
az.plot_ppc(idata, group= “prior”, num_pp_samples=150);
I notice the awkward density plot and the prior predictive mean seems to be the source of my issue and I’ll point that out, but the prior predictive is more what I’m aiming for. I might add that I simplified the original formulation in Blitzstein & Hwang, or rather, they defined the parameters a & b as shapes for the two gamma distributions, but also equated them to the gamma distributed variables as well when formulating the beta distributed variable. So I, perhaps foolishly, generalized the shapes as gamma distributions. If you substitute the shape_a, alpha_b
in the code for the alpha_a, alpha_b
in alpha_
(commenting out alpha_a, alpha_b
), then the result of the estimation is worst.
I ran the model afterwards.
with dispersion_model:
idata.extend(pm.sample(1000, tune=2000, random_seed=rng))
az.plot_trace(idata);
Notice the divergences and estimates are horrendous. Here’s the summary and PPC sampling plot.
az.summary(idata, var_names=["alpha_", "lambda_"], round_to=3)
with dispersion_model:
pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=rng)
My first question is what may be causing the divergences? They seem constant in whatever choice of implementation I try.
Also, notice that the predictive mean of alpha_
changes drastic from prior to posterior. Why is that exactly? I’m not entirely certain why that would happen. I would expect some difference, but not this drastic, right? I think the prior predictive/prior predictive mean issue can be solved well if I prior predictive check my shape parameters, but the issue is the change during posterior predictive sampling. Are there any suggestions on how to resolve that issue, and perhaps the convergence failures?