I came to a slightly different solution which, although worked, is much slower in my toy example. However, when applied to real data (100,000 observations with 15 categories), “my” approach is much faster than yours. Do you have any idea why?
category_str = np.array(category_str)
outcome = np.array(outcome)
with pm.Model() as model:
probs = []
for cat in ('a', 'b', 'c'):
p_ = pm.Beta(f'p_{cat}', alpha=1, beta=1)
cat_outcomes = pm.Bernoulli(f'outcome_{cat}', p=p_, observed=outcome[category_str == cat])
probs.append(cat_outcomes)
trace = pm.sample(1000, tune=1000)