Negative binomial returning -inf logp

emiro · July 8, 2020, 12:49pm

Hi,
I am looking at some RNA count data where the count data is modelled as a negative binomial distribution. The problem however is that the logp of the model is -inf. I read an earlier thread where another user had the same issue, and the problem were nan values in the input. However, in my case this is not the problem. Essentially, this line is enough to reproduce the problem:

Y = pm.NegativeBinomial("Y", mu=mu, alpha=alpha, observed=Y_OBS)

Regardless of mu and alpha (which can be replaced by constants), Y will have a logp of -inf. If i replace Y_OBS with Y_OBS / 100 however it works. So I guess the count data is simple too large? The max value in Y_OBS however is only 80 000, which to me doesn’t seem like it is too big?

Anyway, is there a method for “normalizing” the count data in my case or apply any other kind of transformation? I guess simply dividing the count data by a constant won’t work since Y_OBS is supposed to be discrete.

Thanks!

ckrapu · July 8, 2020, 7:56pm

Here’s a couple of questions - would it be possible for you to provide a full example of a few data points (or a randomly created dataset) that blow up the log density? Does a similar model lead to problems if you use a Poisson likelihood instead? Alternatively, if you are willing to rescale or normalize your data, you could treat this as a gamma regression problem.

emiro · July 9, 2020, 8:32am

I tried using a Poisson likelihood instead of the negative binomial one but with the same result.

I am not sure rescaling the data is a valid option. I am essentially trying to implement the DESeq2 model in pymc but I must admit my theoretical background is a bit shaky. As common in RNASeq analysis we have many low counts, and the idea is to make a robust regression on the counts which avoids the infinite difference between data points with zero and non-zero counts. But maybe it is still possible to do using a gamma distribution (of which I have zero experience with).

I will try to generate small set of data that replicates my problem as I am unfortunately not allowed to share the real data.

emiro · July 10, 2020, 11:58am

I was able to reduce the data to a single point which would give a “SamplingError: Bad initial energy” error. Strangely it was not the largest data point in the data set.

Y_OBS = 100                             # No error
Y_OBS = 561270;                      # Error
Y_OBS = 561270 - 1000;           # Error
Y_OBS = 561270 - 10000;         # No error
Y_OBS = 798568;                      # No error

mu = pm.Normal("mu", mu=0, sd=10)
Y = pm.NegativeBinomial("Y", mu=2**mu, alpha=10, observed=Y_OBS);

Any idea of why this happens?

Thanks!

ckrapu · July 10, 2020, 5:32pm

Thanks for the investigative work tracking down the troublesome data point - I have a couple of things here:

Can you confirm that the model component which is blowing up the logp is the likelihood? See here for how to do that.
Can you try changing alpha individually on its own and then changing the hyperparameters for mu on their own and see whether or not the resulting error occurs?
Finally, can you make a plot of the initial logp for values approaching the bad Y_OBS? If you post a complete minimal viable reproduction I can help with this.

emiro · July 13, 2020, 1:05pm

Thanks for the help. I have taken a closer look at the logp values returned using the following code:

test_values = np.arange(32752, 32772, 1);
for v in test_values:
	model = pm.Model();
	with model:
		mu = pm.Normal("mu", mu=0, sd=10);
		Y = pm.NegativeBinomial("Y", mu=2**mu, alpha=10, observed=v);
			
		for RV in model.basic_RVs:
			if RV.name == "Y":
				print(v, RV.logp(model.test_point));

Comparing RV “Y” and “mu”, it is always “Y” that has problematic values. “mu” is for this particular example -3.2 for all of the test values.
Changing the hyperparameters of “mu” does change the initial logp value of “Y”, but only for observed values which returns a valid logp for “Y” (-inf is still -inf). Changing alpha also does not solve the problem.
I checked the logp of a large range of numbers and which number returns -inf seems quite random. Below are the logp values where I increase the observed value by one each iteration.

For observed values between 32752 to 32757: Logp about -78456
For observed values between 32758 to 32766: Logp positive infinity
For 32767: nan logp
For observed values 32768 and larger: negative infinity

For values much larger than 32768 the logp will become valid again, and for even larger values it will become -inf again. This patterns of valid and non-valid logp values is repeated for a quite large range of numbers.

Maybe a rounding error could be the source of this problem as it seems quite random?

ckrapu · July 14, 2020, 1:45pm

This is very helpful information and I appreciate you trying these values out. 32768 is a special value since it is 2^15 and, allowing for negative numbers, is the maximum value allowed for a 16 bit integer. Since the code runs successfully with no issues on my computer, I’d like to identify any points at which a different integer representation might be used on your computer. Would you mind running the script located here and showing the inputs here?

emiro · July 15, 2020, 11:54am

I solved the problem. The issue was that theano.config.floatX was float32 and not float64, which I think makes pymc3 use int16 instead of int32. Thus Y in the code you linked would be int16 (while the test values were of type int64). Simply changing theano.config.floatX to float64 solved this problem.

Thanks! I would never have figured out this was the issue without your help.

ckrapu · July 15, 2020, 3:22pm

I’m glad to hear you solved it. I often use Theano with the 32 bit floating point representation so I will look out for this issue in the future.

Topic		Replies	Views
Negative Binomial Distribution with logp = -inf Questions	14	1488	July 11, 2018
"Some of the observed values are associated with a non-finite logp" v5	2	702	September 16, 2023
Log probability negative infinity Questions	4	1312	December 31, 2019
Negative binomial as a likelihood v5	3	221	March 3, 2024
Get nan or inf from model.logp (model.test point) is an attestation of incorrectly configured model? Questions	22	8246	June 24, 2017

Negative binomial returning -inf logp

Related topics