Modeling count differences (Skellam distribution)

Hi,

Does anyone have experience modeling count differences using the Skellam distribution? The Skellam distribution is currently not supported by pymc3 (but it is available in scipy). I am not familiar with the internals of pymc3 and I don’t know how difficult it is to add a new distribution, but would be happy contribute with some guidance. Or perhaps there are alternatives for modeling count differences?

Thanks,
Fabian

The easiest way to model count or count differences is to use a GLM with a Log link function.

Otherwise, you can define a custom log likelihood densities using pm.DensityDist, you can find an example here. But I am not sure if there is a numerical stable way to compute log likelihood of Skellam.

1 Like

I tried your second suggestion (see here for an example of using the skellam distribution in stan). However, I got stuck getting evaluation of the modified bessel function of the first kind to work on theano tensors. Below is the (non-working) code, if anyone has ideas how to proceed from here. In the meantime, I will consider your first suggestion.

import numpy as np
import pymc3
import theano
import pandas
import scipy.stats

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# create dataset
N = 100
X = pandas.DataFrame({'constant':np.ones(N), 'var':np.random.normal(size=N)})
beta_true = pandas.Series({'constant':25, 'var':20})
mu_true = np.dot(X, beta_true)
var_true = 120

# mu = m1-m2
# var = m1+m2

mu1_true = (var_true + mu_true)/2
mu2_true = (var_true - mu_true)/2

Y = scipy.stats.skellam.rvs(mu1_true, mu2_true)

# model
# wrap modified bessel of first kind
@theano.compile.ops.as_op(itypes=[theano.tensor.dvector, theano.tensor.dvector],
otypes=[theano.tensor.dvector])
def iv(a, b):
    return scipy.special.iv(a, b)
    
def skellam_log(mu1,mu2):
    
    def _inner_skellam_log(k):
        total = (-mu1-mu2)+(theano.tensor.log(mu1)-theano.tensor.log(mu2))*k/2
        log_prob = total+ theano.tensor.log(iv(k, 2*theano.tensor.sqrt(mu1*mu2)))
        return log_prob
    
    return _inner_skellam_log

with pymc3.Model() as model:
    
    beta0 = pymc3.Normal('beta0',0,100)
    beta1 = pymc3.Normal('beta1',0,100)
    
    mean = pymc3.Deterministic('mean', beta0*X['constant'] + beta1*X['var'])
    
    var = pymc3.DiscreteUniform('var',lower=1, upper=1000)
    
    mu1 = (var + mean)/2
    mu2 = (var - mean)/2
    
    obs = pymc3.DensityDist('obs', skellam_log(mu1,mu2), observed=Y)
    
    trace = pymc3.sample(2000, njobs=2, tune=500)

Theano has a theano.scalar.basic_scipy.iv, but I think only on master. It should work if you use that. But I guess this won’t be particularly stable, as I think iv grows pretty quickly. The stan implementation probably has the same problem.

Thanks for the pointer. it appears to work with the iv implementation from theano master. I do get warning about diverging samples though. In any case, I have uploaded a notebook here for anyone who is interested.

2 Likes