The problem is that you are providing random variables instead of a probability distribution as the mixture comp_dist
. You can either get the list of distributions like this:
bernoulli_dists = [pm.Bernoulli('K_'+str(i), mu[i, :], shape=P).distribution for i in range(K_THRESH)]
Or you can change the obs line like this:
obs = pm.Mixture('obs', w, [b.distribution for b in bernoulli_dists], observed=df)
About the likelihood function not being pickleable, did you define it as a top level function? There are many python restrictions on what can and cannot be pickled, and you have to make sure that you write the function in a way that let’s it be pickled.
About the cluster membership, @junpenglao answered this elsewhere but I can’t seem to find the particular thread to link with. You can basically compute the obs.comp_dist.logp
values of each row of observed
, for each mixture component separately. This gives you an intuition about the component to which the observation likely belongs to.