Problems sampling a Poisson single hit model

davipatti · September 10, 2024, 8:29pm

Hello

I’m having trouble sampling from a model. The likelihood of the data comes from a Poisson single hit process which simplifies to: \text{Bernoulli}(p=f(x, v)) with f(x, v) = 1-2^{-10^{x+v}}.

Here’s some test data, and what this function looks like with v=4.

import numpy as np
import matplotlib.pyplot as plt
import pymc as pm

def f(v, x):
    return 1.0 - 2.0 ** (-10.0 ** (x + v))

status = np.array([1, 1, 1, 1, 1, 0, 0, 0])
x = np.array([0, -1, -2, -3, -4, -5, -6, -7])

x_plot = np.linspace(-8, 0)
plt.plot(x_plot, f(v=4, x=x_plot), label="f(v=4, x=x)")
plt.scatter(x, status, label="Data")
plt.ylabel("P(positive)")
plt.xlabel("x")
plt.legend()

output

Running this model, even with a prior bang on the expected value, gives 100% divergences:

with pm.Model():
    v = pm.Normal("v", 4.0, 1.0)
    pm.Bernoulli("lik", p=f(v=v, x=x), observed=status)
    pm.sample()

Presumably the issue comes from the fact that v is an exponent of an exponent. I can’t think of how to reparameterize this function. Does anyone have any ideas? Or is there anything else I can try?

Thanks in advance
David

cluhmann · September 11, 2024, 12:28am

I think you are just running into overflow issues. For example, if you clip p your model samples just fine:

import pytensor.tensor as pt

eps = 10e-6
with pm.Model() as model:
    v = pm.Normal("v", 4.0, 45.0)
    p = f(v=v, x=x)
    # clip p to a reasonable range
    p = pt.clip(f(v, x), eps, 1-eps)
    # save p into the inferenceData object
    p = pm.Deterministic("p", p)
    pm.Bernoulli("lik", p=p, observed=status)
    idata = pm.sample()

and produces:

import arviz as az

In [62]: az.summary(idata)[["mean", "sd", "r_hat"]]
Out[77]: 
       mean     sd  r_hat
p[0]  1.000  0.000    NaN
p[1]  1.000  0.000    NaN
p[2]  1.000  0.001    1.0
p[3]  0.984  0.061    1.0
p[4]  0.718  0.269    1.0
p[5]  0.205  0.182    1.0
p[6]  0.026  0.032    1.0
p[7]  0.003  0.003    1.0
v     4.344  0.484    1.0

Note that the posterior of credible values of v essentially prevents any useful variation in the values of p for the first couple of observations (presumably because of the “extreme” values of x), which is likely a symptom of the same issue.

I would expect someone smarter than me would be able to reparameterize this so that there is greater numerical stability as v and/or x grows.

jessegrabowski · September 11, 2024, 3:24am

To figure out if something tricky can be done, I suspect we’ll need to know more about the model and the math that leads up to the exp of exp term.

ricardoV94 · September 11, 2024, 8:52am

Can you define the log(p) instead of p and convert it to logit with something like logp - log1p(-logp)? The Bernoulli can be parametrized by a logit_p directly

davipatti · September 11, 2024, 2:34pm

Thank you for the suggestions!

Clipping works well, and is a usable solution. I only care about the posterior of v, so lack of variation in p for the first few data is probably fine.

Although clipping essentially solves this, I’d be curious if anyone has suggestions for how this may be reparameterised. Here’s some more details on the model. The model is a Poisson single hit with rate:

r = log(2) \times e^{log(b) \times (v + x) }

The probability of positive data (y=1) is:

p=1 - e^{-r}

Substituting in r and simplifying gives: p = 1 - 2^{-b^{x+v}}, which is where f(x, v) came from (with b=10).

(The likelihood is y \sim \text{Bernoulli}(p) )

I had thought about trying to pass logit(p) to pm.Bernoulli, but given this equation for p, I don’t think a nicer form for log(p) is possible?

ricardoV94 · September 11, 2024, 7:58pm

The log converts that first exponent to multiplication

bob-carpenter · September 16, 2024, 4:54pm

Your issue is with this:

This will round to 1 in double precision floating point if x + v is greater than 1.8 or so. In double precision floating point, the largest number less than 1 is around 1-1e-16 and the smallest number greater than 0 is 1e-300 or so. There’s even less range in single-precision (as you get by default in JAX, for example).

So you want to flip things around and use just the -2**(-10**(x + v)) part. If PyMC doesn’t have the complementary parameterization, exploit the equality

\textrm{bernoulli}(y \mid p) = \textrm{bernoulli}(1 - y \mid 1- p).

But even that’s going to be a problem if you have to evaluate f—you want to give the Bernoulli the log probability or logit probability rather than the probability for stability.

davipatti · September 24, 2024, 8:55pm

Hi @ricardoV94 and @bob-carpenter, many thanks for your input. And apologies for the slow response, I’ve been away for a bit.

I think I’m still stuck. Even if define the likelihood in terms of the negative data, using:

p(y=0) = 2^{-10^{x+v}}

then, that does give a nice expression for log(p) =log(2)10^{x+v}

But to calculate logit(p) (to pass to pm.Bernoulli) I still need log(1-p) to plug into:

logit(p) = log(p) - log(1-p)

And I don’t think log(1-p) can’t be simplified to remove the exponent of an exponent…

bob-carpenter · September 24, 2024, 11:11pm

The new GPT reasoning module is pretty good at this kind of thing. It’s suggesting this, where x is your x + v:

import numpy as np

def stable_log_ratio(x):
    a = np.power(10, x) * np.log(2)
    s = -a - np.log(-np.expm1(-a))  # Computes log(2^(-10^x) / (1 - 2^(-10^x))) stably
    return s

The other thing to try would be to use log1p(-p) in place of log(1 - p) in the expression as you have it.

ricardoV94 · September 25, 2024, 4:18am

You don’t need to worry about that sort of more trivial stuff with PyMC as the PyTensor backend does those sort of stabilization rewrites automatically. Writing log(1 - pt.exp(x)) will achieve the same down the road

Also just to be clear, users should use pytensor.tensor and not numpy operations for PyMC models. The same function names/ signatures that exist in numpy should exist in PyTensor

Anyways does the suggestion clear up things @davipatti ?

davipatti · September 27, 2024, 2:53pm

Thanks for the suggestions. Yes, using @bob-carpenter’s stable_log_ratio does sample without divergences.

(I’m now getting errors when I try to set up a hierarchical model for multiple samples, but I think this is unrelated…)

Many thanks!

Topic		Replies	Views
Biased (?) results from Poisson regression model Questions	4	1164	January 17, 2020
Bernoulli samples all the same? Questions	1	2440	May 29, 2018
Sampling from nested models Questions	10	1144	May 27, 2021
Poisson regression model for beginner v5 theano , modeling	5	1679	June 28, 2023
Simple logistic model: Bad initial energy log p's -Inf and Inf Questions	2	399	July 28, 2021

Problems sampling a Poisson single hit model

Related topics