Hey! I have a model of aggregated binomial data per group. Together, I have ~50k groups and in each group thousands of trials (with low number of successes). So my data looks something like:
import pandas as pd
import bambi as bmb # version 0.10.0
df_simple = pd.DataFrame({
'group': ['A', 'B', 'C'],
'y': [10, 20, 30],
'n': [100000, 5000, 900000]
})
m = bmb.Model('p(y, n) ~ (1|group)', data=df_simple, family='binomial')
idata = m.fit()
I also use informative priors. What are the options for improving the performance of the model? I’m using inference_method="nuts_blackjax"
on CPU which seems a bit faster on the subset than “standard” MCMC. Any other recommendations or resources? Thank you!
Hi Miha!
If your model only has the group
predictor, the PyMC wouldn’t be too complicated. I think you can do something like
y = df_simple["y"].to_numpy()
n = df_simple["n"].to_numpy()
groups, groups_idx = np.unique(df_simple["group"])
coords = {"group": group}
with pm.Model(coords=coords) as model:
intercept_mu = pm.Normal("intercept_mu", mu=0, sigma=1)
intercept_sigma = pm.HalfNormal("intercept_sigma", sigma=1)
intercept_offset = pm.Normal("intercept_offset", dims="group")
intercept = pm.Deterministic("intercept", intercept_mu + intercept_sigma * intercept_offset)
p = pm.math.invlogit(intercept)
pm.Binomial("outcome", p=p, n=n, observed=y)
which uses a non-centered parameterization.
This PyMC model is equivalent to the Bambi model and perhaps is faster.
1 Like