# Problem with imputation of missing data for a Bernoulli distribution

I’m trying to test out some simple imputation of missing observed values with a Bernoulli distribution and hit a theano problem, and was wondering if anyone had any ideas about solving it, or if it’s a theano bug. I’m using PyMC3 version 3.6 and theano version 1.0.3. A simple version of my code is as follows:

``````import pymc3 as pm
from scipy.stats import bernoulli

# set "true" probability of rain
true_rain = 0.41

# set number of previous "observations"
nobs = 1000

# set the observations
has_rained = bernoulli.rvs(true_rain, size=nobs)

# try subsituting in a miss sample
has_rained[20] = -1  # add missing samples as -1

with pm.Model() as model:
prain = pm.Uniform('prain', 0.0, 1.0)  # prior on probability of rain

# distribution of prain given the number of observed times it has rained
rain = pm.Bernoulli('rain', p=prain, observed=has_rained)

trace = pm.sample(2000, tune=6000, discard_tuned_samples=True, chains=2)
``````

The final lines of the error message that this produces are:

``````~/.conda/envs/survival/lib/python3.6/site-packages/theano/tensor/type.py in
filter_variable(self, other, allow_convert)
232             dict(othertype=other.type,
233                  other=other,
--> 234                  self=self))
235
236     def value_validity_msg(self, a):

TypeError: Cannot convert Type TensorType(int64, vector) (of Variable
rain_missing_shared__) into Type TensorType(int64, (True,)). You can try to manually
convert rain_missing_shared__ into a TensorType(int64, (True,)).
``````

I can only assume that this is failing due to an issue with the Bernoulli distributions use of integer or boolean types, as this isn’t a problem that is noted in this example.

I also see the same error if trying to pass a theano `shared` variable, created from a numpy array of ones and zeros, as observations to a Bernoulli distribution.

Yes, there is an issue with masking only 1 value: https://github.com/pymc-devs/pymc3/issues/3122

Unfortunately, we dont currently have a fix yet…

2 Likes

Thanks, I’ll have a think about whether this might be a problem for me and if I have any ideas for a fix I’ll be sure to post them on the open issue (although with my very, very limited theano knowledge I doubt I’ll be much help!)

1 Like

I’ve just posted a potential fix for this here. It just involves adding the lines

``````if isinstance(var.tag.test_value, np.ndarray):
if len(var.tag.test_value) == 1:
shared.type = theano.tensor.TensorType(var.dtype, (True,))
``````

in `model.py` after this line.

1 Like

This is fixed in PyMC3 with this PR. This is not in a release yet though.

2 Likes