Observing the log of Poisson RV

Boilermaker14 · December 11, 2020, 5:15pm

I have a physical model where a discrete sample, Y_{sample}, is taken (poisson process), but then that sample is measured via a separate process as Z_{observed} which is dependent on the log of the poisson RV Y_{sample} with some gaussian noise. In the latter process, Y_{sampled}==0 from the poisson sampling have Z_{observed} measured as \tau*, but Y_{sampled}>0 has Z_{observed} measured as continuous, and is sampled from a normal distribution around linear function of log(Y_{sampled}).

I’d like to infer \lambda, or rather \theta which leads to \lambda, but I’m having trouble breaking this down into the correct likelihood for Z_{observed}. I think I can split the Z_{observed} into Z_{obs,\tau*}=Z_{observed}[Z_{observed} == \tau^*] which has likelihood given by the Poisson distribution directly. But for the rest of the data, Z_{obs,else}, how should I handle? It seems like I wouldn’t want to use a zero-truncated poisson for this, because that artificially inflates the probability of non-zero $Y_{sampled}, when zero was actually a possibility, just not one that I observed in the Z_{obs,else} array. It seems like I need some way to take the Y_{sampled} RV and truncate it so as not to pass a zero to log() in the Normal likelihood.

Thoughts?

Thanks in Advance.

junpenglao · December 13, 2020, 10:46am

I would also split it up.

for Z_{observed} == \tau*, I use a Poisson likelihood directly:

with model:
    lambda_ = pm.Deterministic("lambda", f(theta, x_predictor))
    pm.Poisson("observed_tau", 
               lambda_[Z_observed == tau],
               observed=np.zeros_like(Z_observed[Z_observed == tau]))

for the rest, I would skip the Poisson latent altogether:

with model:
    pm.Normal("observed_rest", 
               mu=beta0 + beta1 * tt.log(lambda_[Z_observed ~= tau]),
               sigma=sigma_obs,
               observed=Z_observed[Z_observed ~= tau])

junpenglao · December 13, 2020, 10:47am

It’s pretty close to mixture distribution likelihood like zero inflated poisson etc

Boilermaker14 · December 16, 2020, 2:42am

Thanks for the answer @junpenglao. I’ve implemented this, and it runs successfully, but I’m thinking there is some bias introduced for data points with low \lambda and Z_{observed} < \tau^*. We are evaluating only the odds of the data under that pm.Normal() but not the odds that we ought to be using that pm.Normal() model, which is p(Poisson(lambda[Z_observed ~= tau]) ~= 0). For example, a data point <\tau^* may seem very probable under the pm.Normal() model, but in reality is quite unlikely, as it should have been \tau^*. So we need to discount the probability of pm.Normal() by the probability of a non-zero sample from Poisson distribution.

I added the following to the code:

def CdfPosPoisson(lam):
# return the logP of all non-zero results of Poisson distribution. 
return tt.log(1-tt.exp(-lam))

with model:
LessLoD = pm.Potential('LessLoD', CdfPosPoisson(lambda[Z_observed ~= tau]))

which is just adding the logP that we would have sampled this non-zero number from the Poisson, as the CdfPosPoisson() is returning logP of 1-p(0).

This brings me to my new problem. The model runs successfully on some datasets, but not others. When it fails, it fails with zeros on diagonal of mass matrix, or if I initialize with advi, it fails on the first iteration with:
FloatingPointError: NaN occurred in optimization.
The current approximation of RV K.ravel()[0] is NaN.

Again, this only occurs on certain datasets.

I took a look at: NaN occurred in optimization with ADVI - #3 by Nadheesh
and: Get nan or inf from model.logp (model.test point) is an attestation of incorrectly configured model? - #11 by junpenglao

I don’t seem to have anything squirrely with logP:
In [18]: model.check_test_point()
Out[18]:
sigma_DT_log__ -0.77
beta0 -4.83
beta1 -4.83
K -3.91
LoD_obs -11.50
DT_obs -2986.69

and none of the test_values seem to have anything weird going on either.

This leaves your last suggestion from that thread:

make sure the starting value of the approximation is valid:
point = apprx.groups[0].bij.rmap(apprx.params[0].eval())
point #check to see there is no NaN or Inf
for var in logistic_model.free_RVs:
    print(var.name, var.logp(point))
If all the steps above pass fine, there is a problem with setting up the approximation score function, which is not that easy to diagnose.

I’ve never used ADVI for inference, only for initialization. How can I check this last step using ADVI for initialization? Or can you see some other solution to this problem?

Thanks,
Daniel

junpenglao · December 16, 2020, 7:09am

CdfPosPoisson does not have support on R, so my guess is that if you have lambda very close to 0 in the initial condition it breaks the sampler:
https://www.google.com/search?q=log(1-exp(x))&oq=log(1-exp(x))

Also you might consider implementing the cdf in a more numerical stable way (see Eq. 7 in https://cran.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf)

Boilermaker14 · December 16, 2020, 2:25pm

That’s fantastic - Thanks @junpenglao! Following the second link, my cdf is now:

def CdfPosPoisson(lam):
    # return the logP of all non-zero results of Poisson distribution.
    return tt.switch(tt.le(lam, np.log(2)),
                     tt.log(-tt.expm1(-lam)),
                     tt.log1p(-tt.exp(-lam)))

This performs quite well.
I had tried some numerical tricks for large lambda, but not for small, and the function (1-e^{-\lambda}) has stability issues at both large and small values of \lambda with a minimum around ln2, where these approximations switch.
Thanks again!

Topic		Replies	Views
Poisson realization of log-normal distribution Questions	4	807	January 28, 2021
Poisson likelihood over a Gaussian process Questions	4	1361	November 13, 2019
What is the best way to estimate theta and psi for ZeroInflatedPoisson? version agnostic modeling	6	959	May 27, 2022
"Generalized ZeroInflatedPoisson" / observed deterministic variable Questions	2	449	December 4, 2020
Observed data dependent on random variables Questions	8	640	December 30, 2020

Observing the log of Poisson RV

Related topics