Formulating a gamma distribution to model a r.v

I’m trying to model a gamma distribution with a shape parameter that’s defined by the addition of shape parameters from two gamma distributed variables:

T ~ Gamma(a+b, lambda),

where a, b are the shape parameters and the rate parameter is identical in both gamma distributions. The example comes from Blitzstein & Hwang (2015), ch. 8.5 on beta-gamma connections. I have a variable similarly defined as the one in their example and wanted to model it using this formulation as well. Assuming I can estimate adequately the underlying parameters in my process, I tried to implement the probability model as below. I adopted the workflow from the prior and posterior predictive check from the examples page.

The variable:

transformed_dispersion

array([0.56464136, 0.544074  , 0.54767713, 0.96010642, 0.55094633,
       0.54671792, 0.54519951, 0.54318711, 1.30211609, 0.54587553,
       0.54579041, 0.54559012, 0.54449852, 0.54477824, 0.54341431,
       0.89046267, 0.54415172, 0.54503546, 0.88186893, 0.54354371,
       0.54484279, 0.78528687, 0.54671329, 0.54525302, 0.5444967 ,
       0.54473892, 0.54320891, 0.76939255, 0.8488386 , 0.59831084,
       1.61372099, 0.92526673, 0.54835539, 0.55301413, 0.55960888,
       1.0766595 , 0.80705567, 0.86671976, 0.72298249, 0.76118831,
       0.79509659, 0.54580728, 0.54514719, 1.00622038, 0.77159335,
       0.85368028, 0.89595386, 0.83969437, 0.77304834, 0.9339557 ,
       0.79701339, 0.54771662, 0.55438468, 0.88408173, 1.01462635,
       0.54626592, 0.94273911, 1.12030071, 0.5485798 , 1.30943247,
       0.54543916, 0.54190064, 1.32258268, 0.54568767, 0.91973358,
       0.84410375, 0.5443534 , 0.54377197, 0.54519768, 0.54539762,
       0.54488623, 0.54560625, 0.54454563, 1.52545962, 0.54577678,
       0.54612133, 0.54560943, 0.54523644, 0.54546741, 2.73341528,
       0.55144868, 0.54929153, 0.56087582, 1.17688901, 1.07010819,
       0.85147739, 0.94147768, 0.54881994, 0.54744296, 1.35048325,
       1.17996367, 0.55078861, 0.74361047, 1.0098548 , 0.55733456,
       0.55530399, 0.55239452, 1.04952795, 0.89954851, 0.86357334,
       0.5581642 , 1.04565779, 1.05150538, 1.04779526, 0.55450942,
       0.94367635, 0.96010688, 0.71399139, 0.55458086, 0.54810976,
       3.10544337, 0.55473681, 0.551803  , 0.55724666, 0.57173589,
       0.56491214, 1.45542065, 0.98102387, 0.54792521, 0.5582697 ,
       1.03444558, 1.44342268, 0.55523299, 0.5521674 , 0.55767641,
       0.55915942, 1.30490541, 1.45154838, 1.87692049, 3.00020164,
       1.25916979, 0.8893854 , 1.17207219, 1.28325117, 0.55889352,
       1.02714597, 0.87394816, 0.55073515, 0.87837141, 1.03374477,
       0.85152393, 0.94548098, 1.12847992, 1.00038457, 0.90305005,
       0.91964807, 0.54994632, 0.55014778, 0.55077364, 0.55103734,
       0.95452579, 0.55105353, 1.05731227, 0.55028822, 1.06512706,
       0.98798462, 0.921823  , 1.00637416, 0.88531057])

The model implementation for priors:

with pm.Model() as dispersion_model:
    
    shape_a = pm.Normal("shape_a", 1.0, 0.1)
    shape_b = pm.Normal("shape_b", 1.7, 0.1)
    
    alpha_a = pm.Gamma("alpha_a", alpha=shape_a, beta=0.5)
    alpha_b = pm.Gamma("alpha_b", alpha=shape_b, beta=0.5)
    
    alpha_ = pm.Deterministic("alpha_", alpha_a + alpha_b)
    
    lambda_ = pm.Normal("lambda_", 0.5, 0.1)
    
    pm.Gamma("dispersion", alpha=alpha_, beta=lambda_, observed=transformed_dispersion)
    idata = pm.sample_prior_predictive(samples=150, random_seed=rng)

The summary of the priors.

I also plotted the PPC for priors (if this isn’t normal practice, lmk why).

az.plot_ppc(idata, group= “prior”, num_pp_samples=150);

I notice the awkward density plot and the prior predictive mean seems to be the source of my issue and I’ll point that out, but the prior predictive is more what I’m aiming for. I might add that I simplified the original formulation in Blitzstein & Hwang, or rather, they defined the parameters a & b as shapes for the two gamma distributions, but also equated them to the gamma distributed variables as well when formulating the beta distributed variable. So I, perhaps foolishly, generalized the shapes as gamma distributions. If you substitute the shape_a, alpha_b in the code for the alpha_a, alpha_b in alpha_ (commenting out alpha_a, alpha_b), then the result of the estimation is worst.

I ran the model afterwards.

with dispersion_model:
    idata.extend(pm.sample(1000, tune=2000, random_seed=rng))

az.plot_trace(idata);

Notice the divergences and estimates are horrendous. Here’s the summary and PPC sampling plot.

az.summary(idata, var_names=["alpha_", "lambda_"], round_to=3)

with dispersion_model:
    pm.sample_posterior_predictive(idata, extend_inferencedata=True, random_seed=rng)

My first question is what may be causing the divergences? They seem constant in whatever choice of implementation I try.

Also, notice that the predictive mean of alpha_ changes drastic from prior to posterior. Why is that exactly? I’m not entirely certain why that would happen. I would expect some difference, but not this drastic, right? I think the prior predictive/prior predictive mean issue can be solved well if I prior predictive check my shape parameters, but the issue is the change during posterior predictive sampling. Are there any suggestions on how to resolve that issue, and perhaps the convergence failures?

Can you provide a runnable example? Without observed data, the model cannot be run right now.

The variable tranformed_dispersion is the observed data?

Ah, apologies. I skipped the giant wall of floats :joy: .

No problem hahaha it is awkwardly formatted.

So if you look at the full summary (not just alpha_ and lambda_), it seems pretty clear that the issue is with alpha_a and alpha_b. This make sense because you are creating alpha_ as the sum of these two parameters. But I am not sure how you expect to identify these two parameters. Even if their sum is reasonably identifiable, the individual parameters will not be (i.e., there will be an infinite set of value-pairs that give you the exact same sum). Are you reasonably confident that this model is really what you are after?

So, the alpha_a, alpha_b parameters aren’t identifiability under this formulation? I’d say conceptually, this model is what I’m after, but I’m thrown for a loop if the formulation is a problem :sweat_smile:

Well, if I told you that alpha_ was definitely exactly 1.0, what does that tell you about the value of alpha_a? What does it tell you about the value of alpha_b? You can see the tradeoff/dependency in the pair plot:

   with dispersion_model:
        az.plot_pair(idata, var_names=["alpha_", "alpha_a", "alpha_b"], divergences=True)

The plot cut out the axes, but I’m assuming alpha_a is the top y-axis, alpha_b is the bottom y-axis, alpha_ is the left x-axis, and alpha_a is the right x-axis. If so, then it makes more sense. There’s no trend or relationship that could help describe values of alpha_ or even the observed data. I think defining the shape_a, shape_b as hyperparameters for alpha_a, alpha_b may be a bit foolish and overly complicating whatever potential relationship could be estimated. Otherwise, perhaps a mixture model is a better solution. Thanks for the demonstration, helped make sense why this wouldn’t work!

1 Like