I renamed BetaBinomial
to Gb1Binomial
and adapted the logp
and _random
methods but my chains completely fail to mix , the draw rate is ~2 draws per second, and the estimates are completely wrong. I wonder if I have a mistranslation from Numpy to Theano, or if thereβs a bug in my PyMC class, or if my probability analysis is just wrong.
A set of three scripts are at https://gist.github.com/fasiha/5bf34e1882fa09ea9b60f32dfd38540f but Iβll summarize them here:
The Gb1Binomial
distribution is similar to BetaBinomial
, in that p \sim Beta(\alpha, \beta) except instead of k \sim Binomial(n, p), I have k \sim Binomial(n, p^\delta) for \delta > 0.
It can be generated in the following way, very similar to BetaBinomial:
def genGb1Binom(Ndata, ALPHA, BETA, DELTA, N):
n = np.ones(Ndata, dtype=int) * N
p = beta.rvs(ALPHA, BETA, size=Ndata)**DELTA
return binom.rvs(n, p)
Its likelihood is more complicated than BetaBinomial
; it needs a couple of helper functions: binomln
is included in PyMC but I need a sign-aware logsumexp
(like Scipyβs), so I include both here:
def binomln(n, k):
return -betaln(1 + n - k, 1 + k) - np.log(n + 1)
def logsumexp(x, signs):
"Adaptation of PyMC's logsumexp, but can take a list of signs like Scipy"
x_max = np.max(x)
result = np.sum(signs * np.exp(x - x_max))
return np.log(result if result >= 0 else -result) + x_max
def logp(k, alpha, beta, delta, n):
i = np.arange(n - k + 1.0)
mags = binomln(n - k, i) + betaln(delta * (i + k) + alpha, beta)
signs = (-1.0)**i
return binomln(n, k) - betaln(alpha, beta) + logsumexp(mags, signs)
Iβve validated this scheme of drawing random variables and this logp
in validate_gb1_logp.py
in the above Gist.
Then I ported it to a PyMC class (gb1binomial.py
in the Gist).
The _random
method is almost exactly the same as BetaBinomial
's, except I have _p = _p**delta
, which is the sole difference between that and Gb1Binomial
.
Then I had to adapt the logp
to Theano tensors, and this is what I came up with: first a Theano port of the logsumexp
helper and then logp
:
def logsumexp(x, signs, axis=None):
"Adaptation of PyMC's logsumexp, but can take a list of signs like Scipy"
x_max = tt.max(x, axis=axis, keepdims=True)
result = tt.sum(signs * tt.exp(x - x_max), axis=axis, keepdims=True)
return tt.switch(result >= 0,
tt.log(result) + x_max,
tt.log(-result) + x_max)
def logp(self, value):
def mapper(n, k, alpha, beta, delta):
i = tt.arange(n - k + 1.0)
mags = binomln(n - k, i) + betaln(delta * (i + k) + alpha, beta)
signs = (-tt.ones_like(i))**i # (-1.0)**i
return binomln(n, k) - betaln(alpha, beta) + logsumexp(mags, signs)
return bound(
theano.map(mapper,
sequences=[self.n, value],
non_sequences=[self.alpha, self.beta,
self.delta])[0], value >= 0,
value <= self.n, self.alpha > 0, self.beta > 0, self.delta > 0)
I create a simple PyMC model in demo_gb1_pymc.py
in the above Gist. It compiles and runs, albeit very very slowly, and produces nonsense results:
Running on PyMC3 v3.8
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [d, b, a]
Sampling 2 chains, 707 divergences: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6000/6000 [2:19:32<00:00, 1.40s/draws]
There were 126 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.5919737808282699, but should be close to 0.8. Try to increase the number of tuning steps.
There were 581 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.31781568556128037, but should be close to 0.8. Try to increase the number of tuning steps.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
a b d
count 4000.000000 4000.000000 4000.000000
mean 9.818727 4.177204 0.818759
std 0.125664 0.016436 0.010090
min 9.575419 4.124418 0.798366
25% 9.695400 4.166789 0.807851
50% 9.835942 4.180275 0.821065
75% 9.936099 4.189759 0.827032
max 9.999970 4.226625 0.839932
I say βnonsenseβ because the samples for the three variables seem to have barely mixed and arenβt what I expect them to be (in the simulation, alpha
is 3.3, beta
is 4.4, and delta
is 0.333).
I realize this is a ton of code and math to take in, but I hope the parallels between Gb1Binomial
and what you already have in BetaBinomial
are straightforward and some kind soul can point out what Iβm doing wrong? Thank you!!!
(N.B. All code samples are on Gist: https://gist.github.com/fasiha/5bf34e1882fa09ea9b60f32dfd38540f)