Modeling count differences (Skellam distribution)

fkloos · July 23, 2017, 8:00pm

Hi,

Does anyone have experience modeling count differences using the Skellam distribution? The Skellam distribution is currently not supported by pymc3 (but it is available in scipy). I am not familiar with the internals of pymc3 and I don’t know how difficult it is to add a new distribution, but would be happy contribute with some guidance. Or perhaps there are alternatives for modeling count differences?

Thanks,
Fabian

junpenglao · July 24, 2017, 8:28am

The easiest way to model count or count differences is to use a GLM with a Log link function.

Otherwise, you can define a custom log likelihood densities using pm.DensityDist, you can find an example here. But I am not sure if there is a numerical stable way to compute log likelihood of Skellam.

fkloos · July 26, 2017, 1:49pm

I tried your second suggestion (see here for an example of using the skellam distribution in stan). However, I got stuck getting evaluation of the modified bessel function of the first kind to work on theano tensors. Below is the (non-working) code, if anyone has ideas how to proceed from here. In the meantime, I will consider your first suggestion.

import numpy as np
import pymc3
import theano
import pandas
import scipy.stats

import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# create dataset
N = 100
X = pandas.DataFrame({'constant':np.ones(N), 'var':np.random.normal(size=N)})
beta_true = pandas.Series({'constant':25, 'var':20})
mu_true = np.dot(X, beta_true)
var_true = 120

# mu = m1-m2
# var = m1+m2

mu1_true = (var_true + mu_true)/2
mu2_true = (var_true - mu_true)/2

Y = scipy.stats.skellam.rvs(mu1_true, mu2_true)

# model
# wrap modified bessel of first kind
@theano.compile.ops.as_op(itypes=[theano.tensor.dvector, theano.tensor.dvector],
otypes=[theano.tensor.dvector])
def iv(a, b):
    return scipy.special.iv(a, b)
    
def skellam_log(mu1,mu2):
    
    def _inner_skellam_log(k):
        total = (-mu1-mu2)+(theano.tensor.log(mu1)-theano.tensor.log(mu2))*k/2
        log_prob = total+ theano.tensor.log(iv(k, 2*theano.tensor.sqrt(mu1*mu2)))
        return log_prob
    
    return _inner_skellam_log

with pymc3.Model() as model:
    
    beta0 = pymc3.Normal('beta0',0,100)
    beta1 = pymc3.Normal('beta1',0,100)
    
    mean = pymc3.Deterministic('mean', beta0*X['constant'] + beta1*X['var'])
    
    var = pymc3.DiscreteUniform('var',lower=1, upper=1000)
    
    mu1 = (var + mean)/2
    mu2 = (var - mean)/2
    
    obs = pymc3.DensityDist('obs', skellam_log(mu1,mu2), observed=Y)
    
    trace = pymc3.sample(2000, njobs=2, tune=500)

aseyboldt · July 26, 2017, 2:42pm

Theano has a theano.scalar.basic_scipy.iv, but I think only on master. It should work if you use that. But I guess this won’t be particularly stable, as I think iv grows pretty quickly. The stan implementation probably has the same problem.

fkloos · July 26, 2017, 9:32pm

Thanks for the pointer. it appears to work with the iv implementation from theano master. I do get warning about diverging samples though. In any case, I have uploaded a notebook here for anyone who is interested.

Topic		Replies	Views
Running into Theano issues when using DensityDist Questions theano	6	1183	February 10, 2021
Regarding likelihood evaluation in case of a blackbox likelihood Questions theano	4	738	October 9, 2020
Gamma Count Model in PyMC3? Questions	3	533	June 19, 2020
Fitting Custom Distributions and Likelihoods Questions theano	5	1102	July 14, 2021
Defining a numeric (custom) likelihood function in PyMC3 Questions theano	2	3057	August 25, 2018

Modeling count differences (Skellam distribution)

Related topics