How can we build a mixture of mixtures?

Ok, I try your strategy : building my own logp mixture…

This is my plan :

  • I build a dataset that is a 50:50 % mixture of two lognormal.
  • I build a model composed of 5 lognormal with (exponential) priors on mu and sd.
  • Then I create an observable DensityDist, with a call to a custom lopg function…

The problem arise when trying to pass the mixture distribution components as argument to the logp function : I just can’t find a way that works!

This is my code, and the error I get :

import sys
import pymc3 as pm, theano.tensor as tt
import numpy as np

import matplotlib.pyplot as plt

# simulated data :  a mix of 2 lognormal
data = np.concatenate( (
    np.random.lognormal(mean=3,sigma=0.5,size=1000),
    np.random.lognormal(mean=5,sigma=0.5,size=1000)) )

nbr = 5

def logp(components,w, data): # the custom logp function     #
    lp = 0
    for d in data: # for all data point
        for i in range(nbr):  # for all components of the mixture
            c = components[i].logp(d) # logp value of data again each components
            lp += (w[i]*c) # integrate weighting
    return lp #tt.ones(1)

with pm.Model() as model:
    components = []
    for i in range(nbr): # build the components list
        components.append(pm.Lognormal.dist(mu=pm.Exponential('mu_'+str(i), lam=1) 
        , sd=pm.Exponential('sd_'+str(i), lam=1)))
    w = pm.Dirichlet('w',a=np.array([0.0000001]*nbr)) # weight vector for the mixture

    # home-made mixture with a custom logp function
    mix = pm.DensityDist('mix',logp,observed={'w':w,'data':data,'components':components})

    # MCMC
    trace = pm.sample(1000, tune=1000)

    # graphes
    pm.traceplot(trace)
    plt.show()

And this is the error I get :
“TypeError: float() argument must be a string or a number, not ‘Lognormal’”

Which correspond to this line of my code :

mix = pm.DensityDist('mix',logp,observed={'w':w,'data':data,'components':components})

The logp function is certainly not correct too, but I can’t even get to the point to execute it!!

I tried to “tt.stack” or “tt.concatenate” on the “components” list, before the call to “DensityDist”, but these functions only work with random variable (and not “.dist” object).
If I declare the components as “pm.Lognormal” (not pm.Lognormal.dist") then I can tt.stack or tt.concatenate them and pass it to the custom logp function, but then I can’t access the lopg function of the “components”!
(and, anyway, I guess that Pymc3 will automatically add a logp cost from each components wich will lead to a erroneous global logp function)

I’m so clueless … ;(

I need to pass the mixture components to the custom logp function to be able to build the mixture logp, but it seems that DensityDist prevents me to do so.