Ok, I try your strategy : building my own logp mixture…
This is my plan :
- I build a dataset that is a 50:50 % mixture of two lognormal.
- I build a model composed of 5 lognormal with (exponential) priors on mu and sd.
- Then I create an observable DensityDist, with a call to a custom lopg function…
The problem arise when trying to pass the mixture distribution components as argument to the logp function : I just can’t find a way that works!
This is my code, and the error I get :
import sys
import pymc3 as pm, theano.tensor as tt
import numpy as np
import matplotlib.pyplot as plt
# simulated data : a mix of 2 lognormal
data = np.concatenate( (
np.random.lognormal(mean=3,sigma=0.5,size=1000),
np.random.lognormal(mean=5,sigma=0.5,size=1000)) )
nbr = 5
def logp(components,w, data): # the custom logp function #
lp = 0
for d in data: # for all data point
for i in range(nbr): # for all components of the mixture
c = components[i].logp(d) # logp value of data again each components
lp += (w[i]*c) # integrate weighting
return lp #tt.ones(1)
with pm.Model() as model:
components = []
for i in range(nbr): # build the components list
components.append(pm.Lognormal.dist(mu=pm.Exponential('mu_'+str(i), lam=1)
, sd=pm.Exponential('sd_'+str(i), lam=1)))
w = pm.Dirichlet('w',a=np.array([0.0000001]*nbr)) # weight vector for the mixture
# home-made mixture with a custom logp function
mix = pm.DensityDist('mix',logp,observed={'w':w,'data':data,'components':components})
# MCMC
trace = pm.sample(1000, tune=1000)
# graphes
pm.traceplot(trace)
plt.show()
And this is the error I get :
“TypeError: float() argument must be a string or a number, not ‘Lognormal’”
Which correspond to this line of my code :
mix = pm.DensityDist('mix',logp,observed={'w':w,'data':data,'components':components})
The logp function is certainly not correct too, but I can’t even get to the point to execute it!!
I tried to “tt.stack” or “tt.concatenate” on the “components” list, before the call to “DensityDist”, but these functions only work with random variable (and not “.dist” object).
If I declare the components as “pm.Lognormal” (not pm.Lognormal.dist") then I can tt.stack or tt.concatenate them and pass it to the custom logp function, but then I can’t access the lopg function of the “components”!
(and, anyway, I guess that Pymc3 will automatically add a logp cost from each components wich will lead to a erroneous global logp function)
I’m so clueless … ;(
I need to pass the mixture components to the custom logp function to be able to build the mixture logp, but it seems that DensityDist prevents me to do so.