Traceplot: ValueError: zero-size array to reduction operation maximum which has no identity

Hi Pymc3 community,
I am getting the following error when running traceplot. I think this may be a bug because the trace looks fine to me. Code and output is below, a pickled model/trace object here [because discourse doesnt allow the upload of a .pkl file… sigh].
Thanks!

def run_pooled_model(data):
      with pm.Model() as pooled_model:
            mu_intr = pm.Normal('mu_intr', mu=0.8, sd=100 ** 2)
            mu_tst = pm.Normal('mu_tst', mu=0.0, sd=100 ** 2)
            mu_age = pm.Normal('mu_age', mu=0.0, sd=100 ** 2)
            mu_sex = pm.Normal('mu_sex', mu=0.0, sd=100 ** 2)
            mu_age_sex = pm.Normal('mu_age_sex', mu=0.0, sd=100 ** 2)
            mu_starttime = pm.Normal('mu_starttime', mu=0.0, sd=100 ** 2)
            # Model error
            model_err = pm.HalfNormal('model_err', sd=0.1)

            # Expected value
            y_est = pm.math.invlogit(mu_intr + \
                                 mu_tst * data['tst'] + \
                                 mu_age * data['age'] + \
                                 mu_sex * data['sex'] + \
                                 mu_starttime * data['starttime'] + \
                                 mu_age_sex * data['age'] * data['sex'])

            # Data likelihood
            y_like = pm.Beta('y_like', mu=y_est, sd=model_err, observed=data["sleepefficiency"])

    with pooled_model:
        pooled_trace = pm.sample(2000)
        
    pm.traceplot(pooled_trace[-1000:])

Auto-assigning NUTS sampler…
Initializing NUTS using advi…
Average ELBO = -inf: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 200000/200000 [09:26<00:00, 352.93it/s]
Finished [100%]: Average ELBO = -inf
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1970/2000 [00:08<00:00, 209.08it/s]100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 1991/2000 [00:09<00:00, 209.33it/s]100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2000/2000 [00:09<00:00, 220.18it/s]
Traceback (most recent call last):
File β€œC:\Program Files\JetBrains\PyCharm 2017.1.2\helpers\pydev\pydevd.py”, line 1585, in
globals = debugger.run(setup[β€˜file’], None, None, is_module)
File β€œC:\Program Files\JetBrains\PyCharm 2017.1.2\helpers\pydev\pydevd.py”, line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File β€œC:\Program Files\JetBrains\PyCharm 2017.1.2\helpers\pydev_pydev_imps_pydev_execfile.py”, line 18, in execfile
exec(compile(contents+"\n", file, β€˜exec’), glob, loc)
File β€œC:/Users/bdyet/GoogleDrive/NapArch/DiscreteBayes/MultiLevelModeling/main.py”, line 144, in
pm.traceplot(pooled_trace[-1000:])
File β€œC:\Users\bdyet\Anaconda3\lib\site-packages\pymc3\plots.py”, line 91, in traceplot
kdeplot_op(ax[i, 0], d, prior, prior_alpha, prior_style)
File β€œC:\Users\bdyet\Anaconda3\lib\site-packages\pymc3\plots.py”, line 126, in kdeplot_op
density, l, u = fast_kde(d)
File β€œC:\Users\bdyet\Anaconda3\lib\site-packages\pymc3\plots.py”, line 835, in fast_kde
grid = convolve(grid, kernel, mode=β€˜same’)[npad: npad + nx]
File β€œC:\Users\bdyet\Anaconda3\lib\site-packages\scipy\signal\signaltools.py”, line 783, in convolve
method = choose_conv_method(volume, kernel, mode=mode)
File β€œC:\Users\bdyet\Anaconda3\lib\site-packages\scipy\signal\signaltools.py”, line 673, in choose_conv_method
max_value = int(np.abs(volume).max()) * int(np.abs(kernel).max())
File β€œC:\Users\bdyet\Anaconda3\lib\site-packages\numpy\core_methods.py”, line 26, in _amax
return umr_maximum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation maximum which has no identity

Perhaps this just happened when you copy/pasted to Discourse, but there appears to be an indentation error where y_est, and y_like are not defined within a model context.

Yea sorry, that’s a copy/paste error.

You get an error like that when the trace is empty. Is your trace OK?

Ok this is strange, if i do pm.traceplot(pooled_trace) its fine, but if i do pm.traceplot(pooled_trace[-1000:]) it throws the error.

trace.csv (272.4 KB)

The trace (converted to dataframe and then csv) is attached. No strange numbers, but there is no variability in it and it predicts incorrect values, so i must have a model misspecification somewhere too.

Hmmm, its looks like this error is stochastic :frowning:

the function that computes the kde adds a little jitter to avoid problems when the trace has values with too little variability :frowning: I will inspect your trace in detail and let you know if I find something. BTW is your model_err OK? Did you try with a broader prior?

I can confirm that your trace does not have enough variability, so you have a sampling problem. Nevertheless I can not reproduce your error with traceplot, the error is properly managed and a warning is returned in the traceplot (as expected in these cases), so may be you are using an older version of PyMC3.

Ahh, the explains the randomness.

Hm ok, i updated. Now i get:
ValueError: Bad initial energy: inf. The model might be misspecified.

The model looks fine to me, is there anything you notice that might be wrong?

Here is the data if you want to run it yourself.
data.csv (201.2 KB)

btw I increased model_err’s sd but that did not help.

You model will run into problem quite easily, as the model_err is not bounded: if model_err**2 > y_est*(1 - y_est) the Beta distribution is undefined. You can try doing:

model_err = pm.Uniform('model_err',lower=0, upper=y_est*(1-y_est), shape=(2481,))

eg, see linke below about infering a comment rate - with reparameterization using the mean http://stats.stackexchange.com/questions/67443/does-the-beta-distribution-have-a-conjugate-prior/67700
http://stats.stackexchange.com/questions/12232/calculating-the-parameters-of-a-beta-distribution-using-the-mean-and-variance/12239

You can also reparameterized it using the mode, see John K. Kruschke 2015, cf. Eqn 9.4, pp. 223, also pp. 236 Figure 9.7 for the estimation of kappa using a Gamma prior.

Otherwise, try the logit-normal model as @aseyboldt said in Beta Regression in pyMC3

3 Likes

You would think, @junpenglao, that given i added the PR to updated the documentation on beta sd limits, that I would have considered that! But i did not. Sigh… That appears to be the issue.

I ended up trying all of them (inc. the logit normal @aseyboldt, which was easier than i first though). The humble old normal distribution gave the best fitting model.

Thanks guys for helping me with this, and also for the great pymc3 package, im looking forward to using it more in the future :slight_smile:

2 Likes