Wrapping a scipy distribution with DensityDist

Dear all,

I’m trying to wrap a distribution from scipy with DensityDist. (I’m aware of the performance drawbacks in comparison to implementing this as a new distribution / aesara op)

Below is what I have tried:

from aesara.compile.ops import as_op
import aesara.tensor as at
import scipy
import pymc as pm
import numpy as np

# Data with: M, n, N, odds = 140, 80, 60, 0.8
data = np.array([35, 26, 35, 31, 27, 30, 27, 35, 29, 38, 27, 34, 33, 30, 29, 29, 31,
       31, 34, 31, 35, 30, 34, 29, 33, 34, 34, 32, 33, 30, 29, 33, 31, 31,
       35, 34, 33, 30, 34, 35, 31, 31, 32, 30, 33, 29, 35, 34, 30, 30, 31,
       34, 31, 31, 32, 29, 31, 30, 26, 33, 31, 28, 34, 30, 28, 28, 29, 32,
       35, 34, 36, 30, 35, 30, 26, 31, 30, 37, 30, 30, 35, 34, 28, 30, 30,
       33, 34, 31, 35, 34, 32, 32, 36, 30, 35, 32, 34, 32, 31, 37])

with pm.Model() as model:
        
    @as_op(itypes=[at.dvector], otypes=[at.dvector])
    def logp_(value):
        return scipy.stats.nchypergeom_wallenius(140, 80, 60, odds=odds.eval()).logpmf(value)
    
    @as_op(itypes=[at.dvector], otypes=[at.dvector])
    def logcdf_(value):
        return scipy.stats.nchypergeom_wallenius(140, 80, 60, odds=odds.eval()).logcdf(value)
    
    @as_op(itypes=[at.dvector], otypes=[at.ivector])
    def random_(value):
        return scipy.stats.nchypergeom_wallenius(140, 80, 60, odds=odds.eval()).rvs()
    
    odds = pm.Beta('odds', 1, 1)
    y = pm.DensityDist("WNCH", logp=logp_, logcdf=logcdf_, random=random_, observed=data)                 
    trace = pm.sample()

It samples but does not fit odds which is apparently just sampled from the prior. I’m really unsure about calling eval when passing odds to scipy … but without it there will be errors which makes some sense as scipy cannot work with the symbolic variables. I’m also unsure about the input types being vectors. I tried scalars and was hoping the magic of broadcasting would take care of dealing with the multiple observation, but if I use scalars, there will be shape errors.

Any help of hints would be highly appreciated!

Many thanks
Jan

1 Like

I could be wrong, but I think odds needs to be passed into the logp as an argument. See this thread for more info. So something like this?

@as_op(itypes=[at.dvector, at.dscalar], otypes=[at.dvector])
def logp_(value, odds):
    return scipy.stats.nchypergeom_wallenius(140,
                                             80,
                                             60,
                                             odds=odds).logpmf(value)

odds = pm.Beta('odds', 1, 1)
y = pm.DensityDist("WNCH",
                   odds,
                   logp=logp_,
                   logcdf=logcdf_,
                   random=random_,
                   observed=data)                 
3 Likes

Great, it works! Many thanks @cluhmann

1 Like

Here a working example with all parameters of this distribution, in case it might be helpful for someone. Also note that random_ is not op as in the earlier snippet

from aesara.compile.ops import as_op
import aesara.tensor as at
import scipy
from pymc.aesaraf import floatX, intX


with pm.Model() as model:
    
    @as_op(itypes=[at.dvector, at.ivector, at.ivector, at.iscalar, at.dscalar], otypes=[at.dvector])
    def logp_(value, M, n, N, odds):
        return scipy.stats.nchypergeom_wallenius(M, n, N, odds).logpmf(value)
    
    @as_op(itypes=[at.dvector, at.ivector, at.ivector, at.iscalar,  at.dscalar], otypes=[at.dvector])
    def logcdf_(value, M, n, N, odds):
        return scipy.stats.nchypergeom_wallenius(M, n, N, odds).logcdf(value)
    
    def random_(M, n, N, odds, rng, size):
        return scipy.stats.nchypergeom_wallenius(M, n, N, odds=odds).\
                      rvs(random_state=rng, size=size)
    
    M =  at.as_tensor(intX(data['M'].values))
    n =  at.as_tensor(intX(data['n'].values))
    N =  at.as_tensor(intX(1))
    
    logodds = pm.Normal('logodds', 0, 1.5)
    p = pm.Deterministic('p', pm.invlogit(logodds))
    odds = pm.Deterministic('odds', at.exp(logodds))
    
    
    y = pm.DensityDist("WNCH", M, n, N, odds, logp=logp_, logcdf=logcdf_,
                        random=random_, observed=data['Obs'].values)   
    trace = pm.sample()
    trace.extend(pm.sample_prior_predictive())
2 Likes

You should use the rng in the scipy call. It can be passed as random_state keyword to rvs if I am not mistaken. Otherwise it won’t be seedable.

1 Like

Good point! Thanks! What about size? Seems like scipy handles the shapes just fine but perhaps size is required to draw posterior predictive samples of different sizes?

Yes size should also be passed. In those cases where the shape of the distribution goes beyond what is implied by the parameters, your function would return less draws than expected

1 Like