I think there is a way to marginalized instead of indexing. The thought behind it is that when you computed the posterior expectation of ncomp_aCH
or ncomp_aCOH
, it’s a weight vector with n elements that summed to 1. So instead of restricting the vector to take value only 0 or 1 (ie an index vector), let it take values between 0 and 1:
with pm.Model() as basic_model:
# Priors for unknown model parameters
b1 = pm.Uniform('b1', lower=0.3, upper=0.5)
ncomp_aCH = pm.Dirichlet('ncomp_aCH', a=np.ones(n)/n)
ncomp_aCOH = pm.Dirichlet('ncomp_aCOH', a=np.ones(n)/n)
aCH = aCH_.dot(ncomp_aCH)
aCOH = aCOH_.dot(ncomp_aCOH)
out = b1 * aCH + aCOH
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=out, tau=sigma, observed=Y)
trace = pm.sample(1000, tune=2000)
By using a Dirichlet with alpha
< 1., the resulting weight will be sparse (with the extreme being a index vector).
However, the above model is difficult to sample (lots of divergences), with some reparameterization:
import theano.tensor as tt
Dirprior = 1./n
with pm.Model() as basic_model:
# Priors for unknown model parameters
b1 = pm.Uniform('b1', lower=0.3, upper=0.5)
beta1 = pm.Gamma('beta1', alpha=Dirprior, beta=1., shape=n)
ncomp_aCH = pm.Deterministic('ncomp_aCH', beta1/tt.sum(beta1))
beta2 = pm.Gamma('beta2', alpha=Dirprior, beta=1., shape=n)
ncomp_aCOH = pm.Deterministic('ncomp_aCOH', beta2/tt.sum(beta2))
aCH = aCH_.dot(ncomp_aCH)
aCOH = aCOH_.dot(ncomp_aCOH)
out = b1 * aCH + aCOH
# Likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=out, tau=sigma, observed=Y)
trace = pm.sample(1000, tune=2000)
pm.forestplot(trace,varnames=['b1', 'ncomp_aCH', 'ncomp_aCOH']);
The result is quite similar with using indexing but IMO this model is much better.