Wrapping a scipy distribution with DensityDist

jeti · June 28, 2022, 10:57am

Dear all,

I’m trying to wrap a distribution from scipy with DensityDist. (I’m aware of the performance drawbacks in comparison to implementing this as a new distribution / aesara op)

Below is what I have tried:

from aesara.compile.ops import as_op
import aesara.tensor as at
import scipy
import pymc as pm
import numpy as np

# Data with: M, n, N, odds = 140, 80, 60, 0.8
data = np.array([35, 26, 35, 31, 27, 30, 27, 35, 29, 38, 27, 34, 33, 30, 29, 29, 31,
       31, 34, 31, 35, 30, 34, 29, 33, 34, 34, 32, 33, 30, 29, 33, 31, 31,
       35, 34, 33, 30, 34, 35, 31, 31, 32, 30, 33, 29, 35, 34, 30, 30, 31,
       34, 31, 31, 32, 29, 31, 30, 26, 33, 31, 28, 34, 30, 28, 28, 29, 32,
       35, 34, 36, 30, 35, 30, 26, 31, 30, 37, 30, 30, 35, 34, 28, 30, 30,
       33, 34, 31, 35, 34, 32, 32, 36, 30, 35, 32, 34, 32, 31, 37])

with pm.Model() as model:
        
    @as_op(itypes=[at.dvector], otypes=[at.dvector])
    def logp_(value):
        return scipy.stats.nchypergeom_wallenius(140, 80, 60, odds=odds.eval()).logpmf(value)
    
    @as_op(itypes=[at.dvector], otypes=[at.dvector])
    def logcdf_(value):
        return scipy.stats.nchypergeom_wallenius(140, 80, 60, odds=odds.eval()).logcdf(value)
    
    @as_op(itypes=[at.dvector], otypes=[at.ivector])
    def random_(value):
        return scipy.stats.nchypergeom_wallenius(140, 80, 60, odds=odds.eval()).rvs()
    
    odds = pm.Beta('odds', 1, 1)
    y = pm.DensityDist("WNCH", logp=logp_, logcdf=logcdf_, random=random_, observed=data)                 
    trace = pm.sample()

It samples but does not fit odds which is apparently just sampled from the prior. I’m really unsure about calling eval when passing odds to scipy … but without it there will be errors which makes some sense as scipy cannot work with the symbolic variables. I’m also unsure about the input types being vectors. I tried scalars and was hoping the magic of broadcasting would take care of dealing with the multiple observation, but if I use scalars, there will be shape errors.

Any help of hints would be highly appreciated!

Many thanks
Jan

cluhmann · June 28, 2022, 12:37pm

I could be wrong, but I think odds needs to be passed into the logp as an argument. See this thread for more info. So something like this?

@as_op(itypes=[at.dvector, at.dscalar], otypes=[at.dvector])
def logp_(value, odds):
    return scipy.stats.nchypergeom_wallenius(140,
                                             80,
                                             60,
                                             odds=odds).logpmf(value)

odds = pm.Beta('odds', 1, 1)
y = pm.DensityDist("WNCH",
                   odds,
                   logp=logp_,
                   logcdf=logcdf_,
                   random=random_,
                   observed=data)

jeti · June 28, 2022, 1:07pm

Great, it works! Many thanks @cluhmann

jeti · June 29, 2022, 12:15pm

Here a working example with all parameters of this distribution, in case it might be helpful for someone. Also note that random_ is not op as in the earlier snippet

from aesara.compile.ops import as_op
import aesara.tensor as at
import scipy
from pymc.aesaraf import floatX, intX


with pm.Model() as model:
    
    @as_op(itypes=[at.dvector, at.ivector, at.ivector, at.iscalar, at.dscalar], otypes=[at.dvector])
    def logp_(value, M, n, N, odds):
        return scipy.stats.nchypergeom_wallenius(M, n, N, odds).logpmf(value)
    
    @as_op(itypes=[at.dvector, at.ivector, at.ivector, at.iscalar,  at.dscalar], otypes=[at.dvector])
    def logcdf_(value, M, n, N, odds):
        return scipy.stats.nchypergeom_wallenius(M, n, N, odds).logcdf(value)
    
    def random_(M, n, N, odds, rng, size):
        return scipy.stats.nchypergeom_wallenius(M, n, N, odds=odds).\
                      rvs(random_state=rng, size=size)
    
    M =  at.as_tensor(intX(data['M'].values))
    n =  at.as_tensor(intX(data['n'].values))
    N =  at.as_tensor(intX(1))
    
    logodds = pm.Normal('logodds', 0, 1.5)
    p = pm.Deterministic('p', pm.invlogit(logodds))
    odds = pm.Deterministic('odds', at.exp(logodds))
    
    
    y = pm.DensityDist("WNCH", M, n, N, odds, logp=logp_, logcdf=logcdf_,
                        random=random_, observed=data['Obs'].values)   
    trace = pm.sample()
    trace.extend(pm.sample_prior_predictive())

ricardoV94 · June 29, 2022, 7:34pm

You should use the rng in the scipy call. It can be passed as random_state keyword to rvs if I am not mistaken. Otherwise it won’t be seedable.

jeti · June 29, 2022, 7:53pm

Good point! Thanks! What about size? Seems like scipy handles the shapes just fine but perhaps size is required to draw posterior predictive samples of different sizes?

ricardoV94 · June 29, 2022, 8:17pm

Yes size should also be passed. In those cases where the shape of the distribution goes beyond what is implied by the parameters, your function would return less draws than expected

Topic		Replies	Views
Wrapping a scipy distribution with CustomDist v5	2	124	June 14, 2024
Custom distributions in PyMc4 v5	1	858	June 13, 2022
Dirichlet distribution in 4.0.0b4 v5 bug	9	599	March 19, 2022
Trouble using DensityDist with a subclassed Op v5	6	768	April 29, 2022
Inference with Aesara/PyMC Questions	13	507	December 8, 2022

Wrapping a scipy distribution with DensityDist

Related topics