Summary:

I have a Python code that utilizes the PyMC library for Bayesian modeling to compare two group performances. This code aims to do this for multiple groups (1000+). However, I’ve noticed that the code’s performance is suboptimal (100+ groups take 24 hours), especially when dealing with a large number of ad groups. I’m looking for advice and suggestions from the community on how to improve the code’s efficiency and make it run faster.

Here is the code I use

```
for compare_group_infos in zip(ad_group_list, prior_mean_a_list, prior_std_a_list, prior_mean_b_list, prior_std_b_list, obs_a_list, obs_b_list):
ad_group_id, prior_mean_a, prior_std_a, prior_mean_b, prior_std_b, obs_a, obs_b = compare_group_infos
with pm.Model() as model:
# Vectorized priors
ctrl_cpas = pm.TruncatedNormal('ctrl_cpas', mu=prior_mean_a, sigma=prior_std_a, lower=0)
treat_cpas = pm.TruncatedNormal('treat_cpas', mu=prior_mean_b, sigma=prior_std_b, lower=0)
# Vectorized delta
deltas = pm.Deterministic("deltas", treat_cpas - ctrl_cpas)
# Vectorized observations
pm.Exponential(f"obs_ctrl", 1 / ctrl_cpas, observed=obs_a)
pm.Exponential(f"obs_treat", 1 / treat_cpas, observed=obs_b)
# Sampling
trace = pm.sample(20000, chains=4)
print('model complete')
delta_samples = np.concatenate(trace.posterior.deltas.data[:, 10000:])
win_ratio = np.mean(delta_samples < 0)
out_dict[ad_group_id] = {'win_ratio': win_ratio, 'control_cpx': np.mean(obs_a), 'treat_cpx': np.mean(obs_b)}
print('evaluation done: %s with win_ratio %f' % (ad_group_id, win_ratio))
```

I previously did some research and plan to try it on GPU or make it parallel following this thread(MCMC for big datasets: faster sampling with JAX and the GPU - PyMC Labs), but I then realized my problem might be different from a standard case, where single run seems ok to be finished within 10 mins.