Mixture of multivariate Bernoullis

The problem is that you are providing random variables instead of a probability distribution as the mixture comp_dist. You can either get the list of distributions like this:

bernoulli_dists = [pm.Bernoulli('K_'+str(i), mu[i, :], shape=P).distribution for i in range(K_THRESH)]

Or you can change the obs line like this:

obs = pm.Mixture('obs', w, [b.distribution for b in bernoulli_dists], observed=df)

About the likelihood function not being pickleable, did you define it as a top level function? There are many python restrictions on what can and cannot be pickled, and you have to make sure that you write the function in a way that let’s it be pickled.

About the cluster membership, @junpenglao answered this elsewhere but I can’t seem to find the particular thread to link with. You can basically compute the obs.comp_dist.logp values of each row of observed, for each mixture component separately. This gives you an intuition about the component to which the observation likely belongs to.

1 Like