Recommended way to create a new discrete distribution?

I was noticing how much faster BetaBinomial is compared to explicitly constructing Beta and Binomial random variables.

I’d like to create a similar distribution, called Gb1Binomial, where the Beta is replaced by a GB1 distribution (generalized Beta distribution of the first kind: if p \sim Beta(\cdots), then p^\delta \sim GB1(\cdots) for \delta > 0).

I see in the Developer Guide that I can create a subclass of pymc3.Discrete that implements logp and random, and that would be my inclination. However, looking at the implementation of BetaBinomial, I see that it uses functions from pymc3.Discrete like draw_values and generate_samples which aren’t exported by the top-level package, and so aren’t available when I import pymc3.

How necessary is the use of these functions and the machinery of _DrawValuesContext in implementing my own distribution? Can I implement my Gb1Binomial in my own application or should I develop it in a clone of the pymc3 repo?

Also, I see how custom distributions can be made with pymc3.DensityDist. This seems like it’ll be less efficient since we give it a log-likelihood function, and no implementation of sampling. Is that understanding accurate? And since I’m able to create a sampling scheme for Gb1Binomial, I should go all out and write an entire distribution?

After a bit of digging!, I found with the following imports I could copy-paste BetaBinomial's source code in my own repo, give the class a new name, and invoke it in a model:

import theano.tensor as tt
from pymc3 import Discrete, floatX, intX
from pymc3.math import tround
from pymc3.distributions.bound import bound
from pymc3.distributions.discrete import binomln, betaln
from pymc3.distributions import draw_values, generate_samples

Sorry for the confusion! I’ll check in if I encounter more issues.

BTW, you dont need the random method necessary - you can still do inference as long as the logp method is implemented.

1 Like

I renamed BetaBinomial to Gb1Binomial and adapted the logp and _random methods but my chains completely fail to mix :sob:, the draw rate is ~2 draws per second, and the estimates are completely wrong. I wonder if I have a mistranslation from Numpy to Theano, or if there’s a bug in my PyMC class, or if my probability analysis is just wrong.

A set of three scripts are at https://gist.github.com/fasiha/5bf34e1882fa09ea9b60f32dfd38540f but I’ll summarize them here:

The Gb1Binomial distribution is similar to BetaBinomial, in that p \sim Beta(\alpha, \beta) except instead of k \sim Binomial(n, p), I have k \sim Binomial(n, p^\delta) for \delta > 0.

It can be generated in the following way, very similar to BetaBinomial:

def genGb1Binom(Ndata, ALPHA, BETA, DELTA, N):
    n = np.ones(Ndata, dtype=int) * N
    p = beta.rvs(ALPHA, BETA, size=Ndata)**DELTA
    return binom.rvs(n, p)

Its likelihood is more complicated than BetaBinomial; it needs a couple of helper functions: binomln is included in PyMC but I need a sign-aware logsumexp (like Scipy’s), so I include both here:

def binomln(n, k):
    return -betaln(1 + n - k, 1 + k) - np.log(n + 1)


def logsumexp(x, signs):
    "Adaptation of PyMC's logsumexp, but can take a list of signs like Scipy"
    x_max = np.max(x)
    result = np.sum(signs * np.exp(x - x_max))
    return np.log(result if result >= 0 else -result) + x_max


def logp(k, alpha, beta, delta, n):
    i = np.arange(n - k + 1.0)
    mags = binomln(n - k, i) + betaln(delta * (i + k) + alpha, beta)
    signs = (-1.0)**i
    return binomln(n, k) - betaln(alpha, beta) + logsumexp(mags, signs)

I’ve validated this scheme of drawing random variables and this logp in validate_gb1_logp.py in the above Gist.

Then I ported it to a PyMC class (gb1binomial.py in the Gist).

The _random method is almost exactly the same as BetaBinomial's, except I have _p = _p**delta, which is the sole difference between that and Gb1Binomial.

Then I had to adapt the logp to Theano tensors, and this is what I came up with: first a Theano port of the logsumexp helper and then logp:

def logsumexp(x, signs, axis=None):
    "Adaptation of PyMC's logsumexp, but can take a list of signs like Scipy"
    x_max = tt.max(x, axis=axis, keepdims=True)
    result = tt.sum(signs * tt.exp(x - x_max), axis=axis, keepdims=True)
    return tt.switch(result >= 0,
                     tt.log(result) + x_max,
                     tt.log(-result) + x_max)


def logp(self, value):
    def mapper(n, k, alpha, beta, delta):
        i = tt.arange(n - k + 1.0)
        mags = binomln(n - k, i) + betaln(delta * (i + k) + alpha, beta)
        signs = (-tt.ones_like(i))**i  # (-1.0)**i
        return binomln(n, k) - betaln(alpha, beta) + logsumexp(mags, signs)

    return bound(
        theano.map(mapper,
                    sequences=[self.n, value],
                    non_sequences=[self.alpha, self.beta,
                                    self.delta])[0], value >= 0,
        value <= self.n, self.alpha > 0, self.beta > 0, self.delta > 0)

I create a simple PyMC model in demo_gb1_pymc.py in the above Gist. It compiles and runs, albeit very very slowly, and produces nonsense results:

Running on PyMC3 v3.8
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [d, b, a]
Sampling 2 chains, 707 divergences: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6000/6000 [2:19:32<00:00,  1.40s/draws]
There were 126 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5919737808282699, but should be close to 0.8. Try to increase the number of tuning steps.
There were 581 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.31781568556128037, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
                 a            b            d
count  4000.000000  4000.000000  4000.000000
mean      9.818727     4.177204     0.818759
std       0.125664     0.016436     0.010090
min       9.575419     4.124418     0.798366
25%       9.695400     4.166789     0.807851
50%       9.835942     4.180275     0.821065
75%       9.936099     4.189759     0.827032
max       9.999970     4.226625     0.839932

I say β€œnonsense” because the samples for the three variables seem to have barely mixed and aren’t what I expect them to be (in the simulation, alpha is 3.3, beta is 4.4, and delta is 0.333).

I realize this is a ton of code and math to take in, but I hope the parallels between Gb1Binomial and what you already have in BetaBinomial are straightforward and some kind soul can point out what I’m doing wrong? Thank you!!!

(N.B. All code samples are on Gist: https://gist.github.com/fasiha/5bf34e1882fa09ea9b60f32dfd38540f)