Get the Likelihood Generative Distribution of Observation from a Mixture Model


Sorry if this is a basic question, but I haven’t been able to find an answer for this anywhere. I’ve fit a mixture model using a combination of Von Mises and Uniform distributions (code below) using a set of observations. Now I want to see whether each observation is more likely to have come from the Von Mises or a Uniform distribution - similar to pomegranate’s “predict” method.

        with model:
            uni = pm.Uniform.dist(lower = -np.pi, upper = np.pi)    
            kappa = pm.Uniform('kappa', lower = 0, upper = 200)  
            #mu = pm.Normal('mu', mu=0, sd=3) 
            von = pm.VonMises.dist( mu = 0, kappa = kappa)        
            #mixture model 
            w = pm.Dirichlet('w', a=np.array([1, 1]), shape = 2)
            like = pm.Mixture('like', w=w, comp_dists = [von, uni], observed=data)

We dont have a built-in function for that: a workaround is to wrap the comp_dist loglike into a function, something like:

complogp = like.distribution._comp_logp(theano.shared(data))
f_complogp = model.model.fastfn(complogp)
y_ = []
for point in trace:
    # get prediction
    y_.append(np.argmax(f_complogp(point), axis=1))

Hope this helps!

Thanks for the help! I’m getting an error back though when I pass point to the f_complogp function. So I ran a trace with the model (which I though was implicit in your for-loop). This make a MultiTrace object I can loop over an pass each dict in the trace to f_complogp. However, I’m getting this error :

    796             if len(args) + len(kwargs) > len(self.input_storage):
    797                 raise TypeError("Too many parameter passed to theano function")
    799             # Set positional arguments
    TypeError: Too many parameter passed to theano function

This stems directly from trying to pass these dicts to the f_complogp function. Any idea what this is about?

yes… the problem is that the point in a trace object contains also deterministic transformation, and theano function only takes input of “raw” value. Something like below should work:

import theano
complogp = like.distribution._comp_logp(theano.shared(y))
f_complogp = model.model.fastfn(complogp)
testpoint = model.test_point
y_ = []
for i in range(trace.nchains): # to get all the points from a multi trace
    tr = trace._straces[i]
    for point in tr:
        d2 = dict((k,v) for k,v in point.items() if k in testpoint.keys())
        # get prediction
        y_.append(np.argmax(f_complogp(d2), axis=1))