I have a question about how to improve model convergence and sampling speed.

Currently, I am fitting a hierarchical model with Negative Binomial(alpha, mu) distributed likelihood. The priors for the parameters alpha and mu are Exponential(hyper_lam) and Gamma(hyper_mu, hyper_sd) distributed respectively. The hyperpriors are flat Uniform distributions for hyper_lam, hyper_mu and hyper_sd. I wish to generate not only the distributions of the parameters but also the posterior predictive distribution as well.

I am fitting approximately 14,000 observations with 200 classes to this model and the data itself is count data on the order of magnitude of 10^5.

On a modestly powered laptop, the model only goes through about 2 iterations a second using a Metropolis step. It has even less iterations using NUTS.

Is there anything I can change in the model or the data to improve sampling speed or require less than 200,000 samples for convergence?

Code used for model shown below.

```
encoder = preprocessing.LabelEncoder()
classes_index = encoder.fit_transform(df['hierarchical_group_subset'])
classes = encoder.classes_
upper_bound_lam = math.floor(counts.quantile(q=0.9))
upper_bound_mu = math.floor(means.quantile(q=0.9))
upper_bound_sd = math.floor(means.std())
with pm.Model() as model:
hyper_alpha_lam = pm.Uniform('hyper_alpha_lam', lower=0, upper=upper_bound_lam)
hyper_mu_mu = pm.Uniform('hyper_mu_mu', lower=0, upper=upper_bound_mu)
hyper_mu_sd = pm.Uniform('hyper_mu_sd', lower=0, upper=upper_bound_sd)
alpha = pm.Exponential(
'alpha', lam=hyper_alpha_lam, shape=len(classes)
)
mu = pm.Gamma(
'mu', mu=hyper_mu_mu, sd=hyper_mu_mu, shape=len(classes)
)
y_est = pm.NegativeBinomial(
'y_est',
mu=mu[classes_index],
alpha=alpha[classes_index],
observed=df['count'].values
)
y_pred = pm.NegativeBinomial(
'y_pred',
mu=mu[classes_index],
alpha=alpha[classes_index],
shape=df['hierarchical_group_subset'].shape
)
start = pm.find_MAP()
step = pm.Metropolis()
hierarchical_trace = pm.sample(500, start=start, step=step, progressbar=True)
```