Hierarchical model for counts with categorical data - help structuring

Hi all,

New here and working my way through examples from various references. I have seen multiple ways in which to structure hierarchical models with categorical variables. As a toy example, say I have a system that collects overdispersed count data (say units consumed) which can be broken down into multiple phases, is associated with a certain available geographic hierarchies (e.g. local, regional, etc.), and where there are multiple processes put into effect to influence the count outcome.

I’ve put together the following model structuring, thinking I could model each process separately. I have seen examples of this structure online and in reference materials.

count = data['count'].values
phase_idx, phase = data['phase'].factorize(sort=True)
region_idx, region = data['region'].factorize(sort=True)
process_idx, process = process['process'].factorize(sort=True)

    'obs_id': np.arange(len(phase_idx)),
    'phase': phase,
    'region': region,
    'process': process

with pm.Model(coords = COORDS) as count_model:
    data = pm.Data('counts', count, dims=('obs_id'))
    ## Hyperpriors 
    mu_lam = pm.Exponential('mu_lam', lam=0.3)
    a_lam = pm.Exponential('a_lam', lam=2)

    ## Priors
    μ = pm.Exponential('μ', lam= mu_lam, dims= ('phase', 'region'))
    α = pm.Exponential('α', lam= a_lam, dims = ('phase', 'region'))
    # Likelihood
    obs = pm.NegativeBinomial('obs', mu= μ[phase_idx, region_idx], \
                                         alpha= α[phase_idx, region_idx], observed= data, dims= ('obs_id'))

    idata_trace = pm.sample(draws= 10_000, tune=5_000, chains=2, target_accept= 0.95)

Is this valid way to structure the model? It samples fine with made up data and provides the output of interest - describing the uncertainty around μ for each categorical variable.

How about all three variables? Unfortunately, the three variables keep killing my kernel.

# Likelihood
obs = pm.NegativeBinomial('obs', mu= μ[phase_idx, region_idx, process_idx], \
                                      alpha= α[phase_idx, region_idx, process_idx], observed= data, dims= ('obs_id'))

Thanks in advance!


I’d like to think a bit more deeply about this, but at first glance this looks good to me.

Just a question to clarify: when does the kernel get killed? Is it during sampling, or before it even gets going? And how many parameters do you end up estimating when you add the third variable?

I’m asking because I’m wondering if the dead kernel might actually be due to your computer running out of memory! If your dataset is large and you have a lot of parameters, that could happen, especially because you’re making 10,000 draws on two chains each, which is a lot. Is there a reason you’re drawing so many? In general, I would think that 1,000 should be more than sufficient.