Negative Binomial Distribution with logp = -inf

Not sure if this is a bug.
However, I’ve been trying to sample a Negative Binomial model but constantly running into logp = -inf.
I have checked that there are no negative values anywhere, so not sure why this is happening.

Here is the code:

with pm.Model() as quantity_model_1:    
    alpha = pm.Uniform("alpha", lower=0, upper=10)
 
    # Priors
    sigma_a = pm.HalfCauchy('sigma_a', 1)

    # Random intercepts
    a = pm.HalfCauchy('a', beta=sigma_a, shape=no_stores)
    b = pm.HalfCauchy('b', beta=1, shape=no_dow)
    d = pm.Normal('d', mu=0.003, sd=0.001,testval = 0.05)     
    k = pm.Deterministic('k',var=b[dow]*pm.math.exp(d*temperature)) ##Note I specifically use exponential function to avoid any negative values. 
    y_hat = pm.Deterministic('y_hat',var=a[storeno]*k)

    # Data likelihood
    vol = pm.NegativeBinomial('vol', mu=y_hat, alpha=alpha, observed=total_quantity)

    start = pm.find_MAP()
    
    step = pm.NUTS()  

    # Inference
    trace = pm.sample(3, start = start, step = step, cores =2, tune=0, chains=1)  #I set it up this way for debugging purposes

Which results in the following:

logp = -inf, ||grad|| = 4,910.9: 100%|██████████| 3/3 [00:00<00:00, 386.14it/s]
Only 3 samples in chain.
Sequential sampling (1 chains in 1 job)
NUTS: [d, b, a, sigma_a, alpha]
0%| | 0/3 [00:00<?, ?it/s]

and subsequently, “ValueError: Bad initial energy: inf. The model might be misspecified.”

When I check the test_point, it is showing “vol” being the culprit for -inf

for RV in quantity_model_1.basic_RVs:
print(RV.name, RV.logp(quantity_model_1.test_point))
alpha_interval__ -1.3862943611198906
sigma_a_log__ -0.7698925914732455
a_log__ -1.539785182946491
b_log__ -5.389248140312718
d -1098.5111832542225
vol -inf

On further investigation:

y_hat.tag.test_value
array([1.83882346, 1.13897182, 1.60580621, ..., 1.39741757, 1.15156967,
1.45517467])

total_quantity
array([7., 2., 4., ..., 1., 1., 1.])

I have attached a pickle of the model. Use this to load it:

with open('quantity_model_fortest.pkl', 'rb') as buff:
data = pickle.load(buff)
quantity_model_1 = data['model']

Versions and main components
PyMC3 Version: 3.4.1
Theano Version: 1.0.2
Python Version: 3.6.5
Operating system: ubuntu
How did you install PyMC3: (conda/pip) pip
quantity_model_fortest.zip

FYI, you can also do model.check_test_point()

The error you are seeing is usually some parameters is not within the support, for example trying to use a negative value for sigma in Normal distribution etc. As a quick try you can run trace = pm.sample() and let the default take care of everything. (PS, using find_MAP output as start value is NOT recommended)

Thanks, appreciate your assistance. I’m a total newbie to pymc3, very much appreciate your experience.

I’ve tried your suggestions but still have the same result. (p.s. I copied those settings from other examples and don’t fully understand the details).

FYI I have tried a range of values and distributions. I found something very peculiar, perhaps it will point to the right direction.

I do not receive any errors in this set up:
k = pm.Deterministic(‘k’,var=b[dow]*pm.math.exp(0)) ## (exp(0) = 1)
y_hat = pm.Deterministic(‘y_hat’,var=a[storeno]*k)
vol = pm.NegativeBinomial(‘vol’, mu=y_hat, alpha=alpha, observed=total_quantity)

However, I do receive errors in this instance:
d=0.0
k = pm.Deterministic(‘k’,var=b[dow]pm.math.exp(dtemperature))
y_hat = pm.Deterministic(‘y_hat’,var=a[storeno]*k)
vol = pm.NegativeBinomial(‘vol’, mu=y_hat, alpha=alpha, observed=total_quantity)

Shouldn’t these be the same?

temperature is an array
array([25.73, 16.15, 23.02, …, 20.24, 16.37, 21.05])
min(temperature) = 5.32
max(temperature) = 39.0

quantity_model_1.check_test_point()
alpha_interval__ -1.390000
sigma_a_log__ -1.140000
a_log__ -2.290000
b_log__ -8.010000
vol -inf

It’s a bit difficult to say without more information of the data. Since in this case the temperature is a vector, k could be broadcast into a different shape that is not intended, which would explain why you see different behaviour using exp(0) and exp(d*temperature)

Could you put the code and data into a self contain notebook and share on eg Gist?

However, I could not figure out how to upload the pickle file to Gist. So I uploaded the pickle to my google drive. Hopefully you can make sense of both. (I’m very much a novice, thanks for your patience).

https://drive.google.com/file/d/1Rvly357y_VrkswzCBRPqakhuNKa8dB3_/view?usp=sharing

No worries, I am happy to help :smile:. Just FYI, you can save your data (or a subset of it) as a txt/csv file, and then you can upload it to Gist to have everything in one place.

So the following is usually my workflow:
I ran Sample and get an error, first I would check the test points:

quantity_model_1.check_test_point()

As you already found out, vol is -inf, so the next step is to check its parameters. alpha is a single parameter so it is usually fine, so I move on to check y_hat:

y_hat_testval = y_hat.tag.test_value
np.sum(~np.isfinite(y_hat_testval))

Turns out there are 49 non-finite values. Since y_hat = a[storeno]*k, I did:

np.sum(~np.isfinite(a[storeno].tag.test_value))
np.sum(~np.isfinite(k.tag.test_value))

So the problem is k… keep checking:

np.sum(~np.isfinite(b[dow].tag.test_value))
np.sum(~np.isfinite(pm.math.exp(d * temperature).tag.test_value))
np.sum(~np.isfinite(temperature))

Turns out there are nans in temperature:

temperature[~np.isfinite(temperature)]

Hope this gives you some ideas of how to debug a -inf problem :wink: Now remove the nans in temperature and try again the model. FYI you can usually do log(temperature) to make the model more numerical stable.

Thanks, that really helped! I have 2 followup questions:

a) While the simulation ran successfully, it did give the wrong dimensions. It created y_hat__5910. How can I address this so it gives the orrect dimension for k?

b) I filled the nan values in temperature by doing
temperature[np.isnan(temperature)]=np.nanmean(temperature)

I read somewhere that it’s possible to use a masked array instead. Would you recommend it? If so, what’s the correct way of implementing it?

Many thanks for your help.

  1. Remember to remove also the entries of other inputs like dow, storeno where temperature is nan. In another word you need to remove the whole slice.
  2. Using masked array is more for observed RV, which does not applied here in this case.

Just to clarify, by “removing the whole slice” do you mean replacing this line:
y_hat = pm.Deterministic('y_hat', <some formula>)
with
y_hat = <some formula>
?

When I tried doing this, indeed I don’t see the problem in the trace any more.

That is a shame, because y_hat is an important statistic that I would like to collect and study. It doesn’t seem to give me the ability to collect trace on y_hat any more. Is this correct?

thanks for your help.

Oh my bad - you are replacing the nan in temperature using:

right? In that case there shouldnt be any shape error. I dont quite understand about:

Let me clarify:

Yes, I did replace the nan in temperature using
temperature[np.isnan(temperature)]=np.nanmean(temperature)

Subsequently, I changed
version a) y_hat = pm.Deterministic('y_hat', a[storeno]*b[dow]*pm.math.exp(d*temperature))
to
version b) y_hat = a[storeno]*b[dow]*pm.math.exp(d*temperature)

Under version a), when I ran pm.summary(trace), there were 5910 rows for y_hat
[y_hat__1, y_hat_2 … y_hat__5910 ]
This is unfortunate because it deems y_hat to have shape of 5910 instead of 1. That’s what I meant by ‘shape error’. It wasn’t an actual error message, but is an unexpected result.

Under version b), because y_hat is no longer in the trace, I don’t get the same problem. But I also don’t get any statistics on y_hat, which is a very important variable in this model.

But y_hat should have shape 5910, event from before. As y_hat give linear prediction according to your input, and your input (storeno, dow, temperature) all have 5910 rows

Thanks @junpenglao. Allow me to ask the question differently.

The model is defined as:
vol = pm.NegativeBinomial('vol', mu=y_hat, alpha=alpha, observed=total_quantity)

i.e. y_hat is a very important parameter, in fact, the most important parameter for the Negative Binomial distribution. It allows me to check that the mean is indeed inline with overall numerical mean. Therefore, I’d like to analyze.

Is there a way to do so?

The previous statement,
‘y_hat = a[storeno]*b[dow]pm.math.exp(dtemperature)’
as you pointed out, has shape of 5910. Thus problematic for my analysis.

Thank you!

Ben

You can compute the mean of the posterior sample. But I suggest you to check the fitting a bit differently: you should check each observation against each posterior of y_hat, something like

. You can find more details in eg https://junpenglao.xyz/Blogs/posts/2017-10-23-OOS_missing.html

thank you so much. It’s really helped me a lot.