Your example doesnt really makes sense to me… Y is a discrete variable but it is a mixture of binomial and normal? Also the mixture component is already observed, I guess you can do comp_dist.distribution.logp(value)
in your mix_mixlogp:
# Define mixed logp
def mix_mixlogp(w, comp_dists):
def logp_(value):
print(value)
comp_logp = tt.squeeze(tt.stack([comp_dist.distribution.logp(value)
for comp_dist in comp_dists], axis=1))
return pm.math.logsumexp(tt.log(w) + comp_logp, axis=-1)
return logp_
But I am not sure whether it really makes sense.
The advantage of mixture model is that you dont need to know which portion of the data is from which component - you dont need a discrete latent label as the data is evaluated on all component, but the weight (after inferece) inform us which component each data point is more likely to belong to.