I have a hierarchical model of homogeneous Poisson processes where I have many groups (3000+) of event sequences, which I assume come from the same population. I model the group Poisson rates using a simple population hierarchy with non-centered parametrisation:
coords = {"obs": data.index.values, "group": group_coords}
with pm.Model(coords_mutable=coords) as model:
time_increments = pm.MutableData("time_increments", data.increment.values/365, dims="obs")
groups = pm.MutableData("groups", data.group_ix, dims="obs")
observed_counts = data.counts.values
pop_mean = pm.Normal("pop_mean", mu=10, sigma=4)
pop_sigma = pm.Exponential("pop_sigma", lam=1)
group_zscores = pm.Normal("group_z", mu=0, sigma=1, dims="group")
group_means = pm.Deterministic("group_means", pop_mean + pop_sigma * group_zscores, dims="group")
avg_increment_lambdas = pm.math.clip(group_means[groups], 0, float("inf"))
total_increment_lambdas = pm.Deterministic("total_increment_lambda", avg_increment_lambdas * time_increments, dims="obs")
counts = pm.Poisson("counts", mu=total_increment_lambdas, observed=observed_counts, dims="obs")
with model:
idata = pm.sample(chains=4, keep_warning_stat=True)
This produces many divergences:
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2579 seconds.
There were 118 divergences after tuning. Increase `target_accept` or reparameterize.
I was aware that I have a potential problem with the zero-clipping but I concluded it’s not an issue because, as far as I could tell, no samples were actually clipped (all group_means
draws are positive):
idata.posterior["group_means"].min().values
array(0.10084822)
So I spent a lot of time trying other things (changing the target_accept
, switching to a centred parametrisarion, even digging cluelessly through detailed divergence reports returned with keep_warning_stat=True
) none of which worked. Eventually, for the lack of anything else to try, I decided to try to get rid of the clipping:
with pm.Model(coords_mutable=coords) as model2:
time_increments = pm.MutableData("time_increments", data.increment.values/365, dims="obs")
groups = pm.MutableData("groups", data.group_ix, dims="obs")
observed_counts = data.counts.values
log_pop_mean = pm.Normal("pop_mean", mu=1.5, sigma=1)
log_pop_sigma = pm.Exponential("pop_sigma", lam=10)
group_means = pm.LogNormal("group_means", mu=log_pop_mean, sigma=log_pop_sigma, dims="group")
avg_increment_lambdas = group_means[groups]
total_increment_lambdas = pm.Deterministic("total_increment_lambda", avg_increment_lambdas * time_increments, dims="obs")
counts = pm.Poisson("counts", mu=total_increment_lambdas, observed=observed_counts, dims="obs")
with model2:
idata2 = pm.sample(chains=4, keep_warning_stat=True)
And lo and behold!
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1460 seconds.
So my questions are:
- Am I right that the clipping was at fault? (As opposed to some other change I had to make in order to get rid of clipping, like setting slightly different priors for the population mean and variance, for example). EDIT: I am aware that now I technically have a centered parametrisation with the LogNormal. I had tried a centred parametrisation with clipping and that gave me divergences
- What, in general, is the main issue with clipping? Is it that it’s non-differentiable? Or that it forces you to sample near the boundary (in the same way as MCMC sampling of HalfNormal causes divergences)?
- Why did clipping cause issues in this particular case, considering that no samples were actually clipped?
- Was there any way to systematically diagnose that? A plot I could have done, some summary sampling statistic I could have looked at? Some anomaly I could have looked for in the sampling reports?