How to use a Deterministic in a mixture?


#1

my goal : I’d like to use a custom distribution as a mixture component.

This is my ‘basic’ code. It works ok to build a log-normal mixture model that fit (more or less) my data :

l_nbr = 3 # number of components
l_mu = pm.Halfnormal('l_mu'),sd=1,shape=l_nbr)
l_sd = m.HalfNormal('l_sd_1',mu=l_sd_prior,sd=1,shape=l_nbr) 
l_comp = pm.Lognormal.dist(mu=l_mu,sd=l_sd,shape=l_nbr)
l_w = pm.Dirichlet('l_w_1',a=np.array([1]*l_nbr))
l_mix = pm.Mixture('l_mix',w=l_w,comp_dists=l_comp,observed=data) 

But I expect that using log-normal with some little offset would work even better,
so I create a new random variable ‘l_offset’ :

l_offset = pm.HalfNormal(sd=1,shape=l_nbr)

… and I try to add it to my previous mixture components :

l_comp = l_comp + l_offset

The rest of the code is unchanged.

I get the following error on the sum code line :
AsTensorError: ('Cannot convert <pymc3.distributions.continuous.Lognormal object at 0x7f5168092390> to TensorType', <class 'pymc3.distributions.continuous.Lognormal'>)

I understand it as the impossibility to add a sampled RV (the offset) to the components (the log-normal distributions) that are not (directly) sampled…

Any idea to work around this problem will be appreciated.


#2

I dont think that’s possible as the components in a mixture distribution must be a density function. You will have to write a lognormal_plus_offset logp function in this case.


#3

ok, as a first try, I will only consider to write my ‘own’ logp function for the log-normal mixture (I’ll try to add the offset when this step will work) :

d = pm.Lognormal.dist(mu=l_mu,sd=l_sd,shape=l_nbr) # only used to access the log-normal logp function
l_comp = pm.DensityDist('l_comp',d.logp,shape=l_nbr)

The rest of the basic mixture code remain the same.

I got this error message, occuring on the mixture building code line.
ValueError: length not known: l_comp [id A]

And if I try with :

l_comp = pm.DensityDist.dist(d.logp,shape=l_nbr)

Then the error message is (still on the mixture building code line)

TypeError: 'DensityDist' object is not iterable

What am I doing wrong ?


#4

I think the issue is that pm.DensityDist needs an observed value. Look at this discussion:


#5

DensityDist works also without observed https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/custom_dists.py

What I meant in the other post was if the model is not evaluated on some observed it will just sample from the prior.


#6

Thanks every one for your remarks. You helped me build my own distribution with the ‘DensityDist’ !
So now I am able to build log-normal distribution with an offset.
I am very happy about that, but I still have a problem obtaining the PPC (Predictive Posterior Check)

This code provides logp and random function for my distribution

def my_logp(dist,offset):
    def foo(x):
        return dist.logp(x-offset)
    return foo

def my_rand(dist,offset,*args, **kwargs):
    def foo(*args, **kwargs):
        r = dist.random(*args, **kwargs)
        return r+offset
    return foo

Where ‘dist’ is the log-normal distribution, and ‘offset’ are created down here (in the model) :

mu = pm.Bound(pm.Flat,lower=0.001,upper=10)('mu')
sd = pm.Bound(pm.Flat,lower=0.001,upper=10)('sd')
l_dist = pm.Lognormal.dist(mu=mu,sd=sd)
offset = pm.HalfNormal('offset',sd=1)

Now , this following line can model a log-normal with an offset :

my_dist = pm.DensityDist('my_dist',logp=my_logp(l_dist,offset),random=my_rand(l_dist,offset),observed=data)

This runs ok, but I have aproblem building the PPC (Predictive Posterior Check) …

ppc = pm.sample_ppc(trace=trace,vars=[l_mix_1],model=model,samples=len(data),size=1)

This line seems to work (progress bar runs), but I get a problem…
It’s like the PPC is not evaluated !
This is what I get when I print the PPC :

{'my_dist': array([Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0,
        Elemwise{add,no_inplace}.0, ..., Elemwise{add,no_inplace}.0,
        Elemwise{add,no_inplace}.0, Elemwise{add,no_inplace}.0],
       dtype=object)}

I guess this problem is certainly linked to provided custom random number generation for ‘my_dist’… But I don’t find the solution myself…

Any help appreciated.


#7

In the random method definition, you are adding a tensor to which gives the error. Maybe try:

def my_rand(dist, point, size=None):
    def foo(point, size=None):
        offset_val = point['offset']
        r = dist.random(point=point, size=size)
        return r+offset_val
    return foo

#8

Thanks this helped me solving the problem !

But now I must confess that I doubt about ‘my_logp’ function (in previous message)… Is it ok ? How can I check it ?


#9

ok, I now I understand how to build a ‘DensityDist’ to build one distribution (à log-normal) with an offset…

This are logp & random functions :

def my_logp(dist,offset):
    def foo(x):
        return dist.logp(x-offset)
    return foo

def my_rand(dist,size=None):
    def foo(point,size=None):
        r = dist.random(point=point, size=size)
        return r+point['offset']
    return foo

But now I’m still strugging to build mixture with this ‘DensityDist’.

Basically I’m just adding the shape information for all my random variables.

mu = pm.Bound(pm.Flat,lower=0.001,upper=10)('mu',shape=l_nbr)
sd = pm.Bound(pm.Flat,lower=0.001,upper=10)('sd',shape=l_nbr)
l_dist = pm.Lognormal.dist(mu=mu,sd=sd,shape=l_nbr)
offset = pm.HalfNormal('offset',sd=1,shape=l_nbr)

l_comp = pm.DensityDist('l_comp',logp=my_logp(l_dist,offset),my_rand(l_dist,offset),shape=l_nbr)
l_mix = pm.Mixture('l_mix',w=np.array([1]*l_nbr),comp_dists=l_comp,observed=data) 

But this lead to an error :

ValueError: length not known: l_comp [id A]

An if I change this line : (I’m not sure if a DensityDist need that ‘extra’.dist’…?)

l_comp = pm.DensityDist.dist(logp=l_dist.logp,random=l_dist.random,shape=l_nbr)

Then I get this error :

TypeError: 'DensityDist' object is not iterable

I’m guessing that there is a dimension problem here, I’m not sure that my random variable component is understood as multi dimensionnal…