Error in sample_posterior_predictive when using trace from dataframe

Hi Pymc3 community,
I get a rather cryptic bug when using sample_posterior_predictive with an LKJCholeskyCov coupled with trace_to_dataframe.

Below is a toy example that triggers the problem (with uncorrelated toy data)

    x = np.random.normal(10,1, size=(1000, 2))
    y = np.random.poisson(lam=x, size=(1000, 2))
    with pm.Model() as model:
        sd_dist = pm.HalfNormal.dist(sd=1e2, shape=2)
        chol_packed = pm.LKJCholeskyCov('chol',
                                        n=2, eta=2, sd_dist=sd_dist)
        chol = pm.expand_packed_triangular(2, chol_packed)
        rate = pm.Deterministic('rate', tt.nnet.softplus(tt.dot(chol, x.T).T)) #add correlation between rates
        obs = pm.Poisson('obs', mu=rate, observed=y)
        trace = pm.sample()

    df_trace = pm.trace_to_dataframe(trace,
                                     varnames=['chol','rate'],
                                     include_transformed=True)

    pm.sample_posterior_predictive(model=model,
                                   trace=df_trace.to_dict('records'),
                                   samples=100)

And the error:

Traceback (most recent call last):
  File "/home/bdyetton/PSleep/src/modeling/run_models.py", line 373, in <module>
    predict_test_with_chol()
  File "/home/bdyetton/PSleep/src/modeling/run_models.py", line 246, in predict_test_with_chol
    samples=100)
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/sampling.py", line 1167, in sample_posterior_predictive
    values = draw_values(vars, point=param, size=size)
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 627, in draw_values
    size=size)
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 817, in _draw_value
    size=None))
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/distributions/discrete.py", line 552, in random
    mu = draw_values([self.mu], point=point, size=size)[0]
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 627, in draw_values
    size=size)
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 794, in _draw_value
    return param.random(point=point, size=size)
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/model.py", line 44, in __call__
    return getattr(self.obj, self.method_name)(*args, **kwargs)
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/distributions/multivariate.py", line 1263, in random
    samples = self._random(n, eta, size=_size)
  File "/home/bdyetton/anaconda3/envs/psleep/lib/python3.7/site-packages/pymc3/distributions/multivariate.py", line 1208, in _random
    C *= D[..., :, np.newaxis] * D[..., np.newaxis, :]
ValueError: non-broadcastable output operand with shape (1,2,2) doesn't match the broadcast shape (2,2,2)

Note that when I dont use trace_to_dataframe, and input the raw trace dirrectly into sample_posterior_predictive then it works fine. However, the real model is hierarchical, and iā€™m trying to make out-of-sample predictions. Therefore, I have to leave out grouping parameters from the trace, as in this answer, hence I have to use trace_to_dataframe.

This may be better raised on github.

I ended going around this bug by using trace.remove_values()

1 Like