Recommended way to create a new discrete distribution?

I renamed BetaBinomial to Gb1Binomial and adapted the logp and _random methods but my chains completely fail to mix :sob:, the draw rate is ~2 draws per second, and the estimates are completely wrong. I wonder if I have a mistranslation from Numpy to Theano, or if there’s a bug in my PyMC class, or if my probability analysis is just wrong.

A set of three scripts are at https://gist.github.com/fasiha/5bf34e1882fa09ea9b60f32dfd38540f but I’ll summarize them here:

The Gb1Binomial distribution is similar to BetaBinomial, in that p \sim Beta(\alpha, \beta) except instead of k \sim Binomial(n, p), I have k \sim Binomial(n, p^\delta) for \delta > 0.

It can be generated in the following way, very similar to BetaBinomial:

def genGb1Binom(Ndata, ALPHA, BETA, DELTA, N):
    n = np.ones(Ndata, dtype=int) * N
    p = beta.rvs(ALPHA, BETA, size=Ndata)**DELTA
    return binom.rvs(n, p)

Its likelihood is more complicated than BetaBinomial; it needs a couple of helper functions: binomln is included in PyMC but I need a sign-aware logsumexp (like Scipy’s), so I include both here:

def binomln(n, k):
    return -betaln(1 + n - k, 1 + k) - np.log(n + 1)


def logsumexp(x, signs):
    "Adaptation of PyMC's logsumexp, but can take a list of signs like Scipy"
    x_max = np.max(x)
    result = np.sum(signs * np.exp(x - x_max))
    return np.log(result if result >= 0 else -result) + x_max


def logp(k, alpha, beta, delta, n):
    i = np.arange(n - k + 1.0)
    mags = binomln(n - k, i) + betaln(delta * (i + k) + alpha, beta)
    signs = (-1.0)**i
    return binomln(n, k) - betaln(alpha, beta) + logsumexp(mags, signs)

I’ve validated this scheme of drawing random variables and this logp in validate_gb1_logp.py in the above Gist.

Then I ported it to a PyMC class (gb1binomial.py in the Gist).

The _random method is almost exactly the same as BetaBinomial's, except I have _p = _p**delta, which is the sole difference between that and Gb1Binomial.

Then I had to adapt the logp to Theano tensors, and this is what I came up with: first a Theano port of the logsumexp helper and then logp:

def logsumexp(x, signs, axis=None):
    "Adaptation of PyMC's logsumexp, but can take a list of signs like Scipy"
    x_max = tt.max(x, axis=axis, keepdims=True)
    result = tt.sum(signs * tt.exp(x - x_max), axis=axis, keepdims=True)
    return tt.switch(result >= 0,
                     tt.log(result) + x_max,
                     tt.log(-result) + x_max)


def logp(self, value):
    def mapper(n, k, alpha, beta, delta):
        i = tt.arange(n - k + 1.0)
        mags = binomln(n - k, i) + betaln(delta * (i + k) + alpha, beta)
        signs = (-tt.ones_like(i))**i  # (-1.0)**i
        return binomln(n, k) - betaln(alpha, beta) + logsumexp(mags, signs)

    return bound(
        theano.map(mapper,
                    sequences=[self.n, value],
                    non_sequences=[self.alpha, self.beta,
                                    self.delta])[0], value >= 0,
        value <= self.n, self.alpha > 0, self.beta > 0, self.delta > 0)

I create a simple PyMC model in demo_gb1_pymc.py in the above Gist. It compiles and runs, albeit very very slowly, and produces nonsense results:

Running on PyMC3 v3.8
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [d, b, a]
Sampling 2 chains, 707 divergences: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6000/6000 [2:19:32<00:00,  1.40s/draws]
There were 126 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5919737808282699, but should be close to 0.8. Try to increase the number of tuning steps.
There were 581 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.31781568556128037, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
                 a            b            d
count  4000.000000  4000.000000  4000.000000
mean      9.818727     4.177204     0.818759
std       0.125664     0.016436     0.010090
min       9.575419     4.124418     0.798366
25%       9.695400     4.166789     0.807851
50%       9.835942     4.180275     0.821065
75%       9.936099     4.189759     0.827032
max       9.999970     4.226625     0.839932

I say β€œnonsense” because the samples for the three variables seem to have barely mixed and aren’t what I expect them to be (in the simulation, alpha is 3.3, beta is 4.4, and delta is 0.333).

I realize this is a ton of code and math to take in, but I hope the parallels between Gb1Binomial and what you already have in BetaBinomial are straightforward and some kind soul can point out what I’m doing wrong? Thank you!!!

(N.B. All code samples are on Gist: https://gist.github.com/fasiha/5bf34e1882fa09ea9b60f32dfd38540f)