Sample_prior_predictive() failing based solely on `samples` parameter

Problem description

MWE below. This is drawing n iid vectors of dim dimensions from a normal distribution with a single covariance drawn from an Inverse Gamma. When samples <= dim it works fine, but otherwise it gets the error below. I’m using pymc 3.6, theano 1.0.3, and python 3.7.1. Feels like there’s some issue with shape of the distributions that is either wrong in my code or is an underlying issue in sample_prior_predictive. It works fine as far as I can tell when drawing using the ‘pm.sampling.sample()’ instead. Thanks.

MWE

import numpy as np
import pymc3 as pm

samples = 4
n = 10
dim = 3
mu = np.array([1] * dim)

with pm.Model():
    cov = pm.distributions.continuous.InverseGamma('cov', alpha=1, beta=1)
    pm.distributions.continuous.Normal('x', mu=mu, sd=pm.math.sqrt(cov), shape=(n, dim))

    pm.sample_prior_predictive(samples=samples)

Error traceback

Traceback (most recent call last):
File “/test.py”, line 13, in
pm.sample_prior_predictive(samples=samples)
File “//anaconda3/lib/python3.7/site-packages/pymc3/sampling.py”, line 1325, in sample_prior_predictive
values = draw_values([model[name] for name in names], size=samples)
File “//anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py”, line 369, in draw_values
size=size)
File “//anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py”, line 463, in _draw_value
return param.random(point=point, size=size)
File “//anaconda3/lib/python3.7/site-packages/pymc3/model.py”, line 43, in call
return getattr(self.obj, self.method_name)(*args, **kwargs)
File “//anaconda3/lib/python3.7/site-packages/pymc3/distributions/continuous.py”, line 460, in random
size=size)
File “//anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py”, line 584, in generate_samples
broadcast_shape = np.broadcast(*inputs).shape # size of generator(size=1)
ValueError: shape mismatch: objects cannot be broadcast to a single shape

1 Like

It does seems there is some kind of shape problem, @lucianopaz do you have some idea?

Also, for a little more context, I’m using a modified version of this to generate data for Bayesian linear regression models. Another similar shape bug I’ve encountered is the resulting data looks wonky when (coincidentally) samples = n. I haven’t had a chance to figure out exactly in what way the data is wonky so I haven’t posted anything about it yet, but the data plots look off and then my downstream methods produce incorrect results. My current workaround if samples = n is to first generate n-1 samples and then separately generate the last sample.

Thanks for pointing this out! Yes, it’s a shape problem on our side.
Maybe changing

File “//anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py”, line 584, in generate_samples
broadcast_shape = np.broadcast(*inputs).shape # size of generator(size=1)

From broadcast to broadcast_distribution_samples may solve the problem. I’ll explore it when I get the chance next week. In the mean time, feel free to open an issue in GitHub.

As for the second problem, it will be much harder to fix consistently

1 Like

This is a fairly difficult problem. It’s caused because our distributions and random variables only have a single shape parameter which does not distinguish between the event_shape (shape of a single draw of the raw distribution), and sample_shape (the shape that is passed into sample_prior_predictive), so when we feed samples from a random variable as a parameter into another RV we have an ambiguity where we can’t really tell if the array’s shape is due to the event_shape or the sample_shape. Presently, we assume it’s the latter, which in some distributions causes us to use a special broadcasting, broadcast_distribution_samples. We are ill equipped to consistently deal with this problem in pymc3 (pymc4 should work much better because we’ll use TFP). The quick and dirty workaround I can think of is to ask for (1, n) samples, instead of just n. Then you should be able to index into the first element to get the right answer.

1 Like

Ah ok, that makes sense, thanks. I’ll try that samples=(1,n) workaround. I’ve seen this shape confusion pop up in a number of other posts, it’d be helpful to have a small discussion/warning somewhere in the documentation.

As for the first workaround, what’s the proper import for broadcast_distribution_samples? It’s not in numpy, and I found an import for it in the pymc3 source code here, but then don’t see the corresponding function in distributions.distribution.py.

EDIT: Looks like broadcast_distribution_samples is new in pymc 3.7, which is why I couldn’t find it in my local pymc3 source code

1 Like

Looks like pm.sample_prior_predictive() only allows int values passed in for the samples parameter. Trying to pass in (1, n) I get

    draw = pm.sample_prior_predictive(samples=num_trials)
  File "/anaconda3/lib/python3.7/site-packages/pymc3/sampling.py", line 1325, in sample_prior_predictive
    values = draw_values([model[name] for name in names], size=samples)
  File "/anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 369, in draw_values
    size=size)
  File "/anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 463, in _draw_value
    return param.random(point=point, size=size)
  File "/anaconda3/lib/python3.7/site-packages/pymc3/model.py", line 43, in __call__
    return getattr(self.obj, self.method_name)(*args, **kwargs)
  File "/anaconda3/lib/python3.7/site-packages/pymc3/distributions/continuous.py", line 457, in random
    point=point, size=size)
  File "/anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 400, in draw_values
    size=size)
  File "/anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 508, in _draw_value
    output = np.array([func(*v) for v in zip(*values)])
  File "/anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 508, in <listcomp>
    output = np.array([func(*v) for v in zip(*values)])
  File "/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 813, in __call__
    allow_downcast=s.allow_downcast)
  File "/anaconda3/lib/python3.7/site-packages/theano/tensor/type.py", line 178, in filter
    data.shape))
TypeError: Bad input argument to theano function with name "/anaconda3/lib/python3.7/site-packages/pymc3/distributions/distribution.py:431" at index 0 (0-based). Wrong number of dimensions: expected 0, got 1 with shape (100,).
1 Like

Issue submitted here.

2 Likes

Oh, right. That happens with deterministics and symbolic tensors that need to be compiled in draw_values. It was on my mental to do list but never got around to do it. If you open a separate issue that throws that, I’ll eventually sort it out.

Issue for the samples parameter failing with a tuple submitted here.

2 Likes