Hi everyone!

I have no previous experience with marginalizing discrete variables out of the model so that PyMC3 can use NUTS, and I’m not seeing how to generalize the cases that have been discussed in the literature to my case - I am probably missing something.

Here is the model I want to work with:

Where:

- \textbf{e} is a vector whose i th element is the cost for observation i (observed)
- \textbf{c} is a vector whose i th element is the category for observation i (observed)
- \sigma, \alpha_0, \alpha_1 are continuous (unobserved)
- \zeta is discrete (this is the problematic one) taking one of m possible values (unobserved)

In a different notation and somewhat more specifically:

Where L is a (known) matrix whose (i,j) th element is some feature of the combination of \zeta and a category that the cost depends on (among other things).

Ultimately, I am interested in finding the probability of each possible \zeta being the true one.

Here’s the code to produce some fake data:

```
sigma = 0.1
a_0, a_1 = 0, 1
z = 3
L = np.array([
[1, 2, 1],
[3, 2, 1],
[1, 4, 5],
[5, 6, 8]
])
# category of the ith observation
category_i = np.repeat(np.arange(L.shape[1]), 1000)
mu_i = a_0 + a_1 * L[z,category_i]
outcome_i = np.random.normal(
loc=mu_i,
scale=[sigma]*len(mu_i)
)
```

And here is the most straightforward PyMC3 model to fit the data:

```
with pm.Model() as model:
sigma = pm.HalfNormal('sigma', sigma=2)
a_0 = pm.Normal('a_0', mu=0, sigma=1)
a_1 = pm.Normal('a_1', mu=0, sigma=1)
z = pm.Categorical('z', np.ones(a.shape[0]))
mu_i = a_0 + a_1 * theano.shared(L)[z][category_i]
outcome_i = pm.Normal(
'outcomes',
mu=mu_i,
sigma=sigma,
observed=outcome_i,
)
trace = pm.sample(
cores=1,
return_inferencedata=True
)
```

The results are pretty good and close to the true value:

However, I am going to have to eventually fit a *huge* amount of data (with 30k possible values for \zeta and millions of observations), and therefore efficiency is crucial (and possibly I’ll use the variational api). But the discrete parameter \zeta forces PyMC3 to use CategoricalGibbsMetropolis, which makes sampling slower than it could be. I was wondering if there is some way to manipulate the model so that instead of sampling \zeta directly, maybe a probability vector is sampled directly?

All the cases I’ve seen around (e.g. in here) use a Normal mixture, but in those models there is a Dirichlet distribution even in the non-marginalized version, so I am unsure how to apply those models to my model above. Also I am unsure if marginalizing \zeta out is the right approach, since \zeta is precisely the value I am interested in!

Thank you very much for your help, any pointers/hints would be much appreciated!