I’m trying to test out some simple imputation of missing observed values with a Bernoulli distribution and hit a theano problem, and was wondering if anyone had any ideas about solving it, or if it’s a theano bug. I’m using PyMC3 version 3.6 and theano version 1.0.3. A simple version of my code is as follows:
import pymc3 as pm
from scipy.stats import bernoulli
# set "true" probability of rain
true_rain = 0.41
# set number of previous "observations"
nobs = 1000
# set the observations
has_rained = bernoulli.rvs(true_rain, size=nobs)
# try subsituting in a miss sample
has_rained[20] = -1 # add missing samples as -1
has_rained = np.ma.masked_values(has_rained, value=-1) # create masked array
with pm.Model() as model:
prain = pm.Uniform('prain', 0.0, 1.0) # prior on probability of rain
# distribution of prain given the number of observed times it has rained
rain = pm.Bernoulli('rain', p=prain, observed=has_rained)
trace = pm.sample(2000, tune=6000, discard_tuned_samples=True, chains=2)
The final lines of the error message that this produces are:
~/.conda/envs/survival/lib/python3.6/site-packages/theano/tensor/type.py in
filter_variable(self, other, allow_convert)
232 dict(othertype=other.type,
233 other=other,
--> 234 self=self))
235
236 def value_validity_msg(self, a):
TypeError: Cannot convert Type TensorType(int64, vector) (of Variable
rain_missing_shared__) into Type TensorType(int64, (True,)). You can try to manually
convert rain_missing_shared__ into a TensorType(int64, (True,)).
I can only assume that this is failing due to an issue with the Bernoulli distributions use of integer or boolean types, as this isn’t a problem that is noted in this example.
I also see the same error if trying to pass a theano shared
variable, created from a numpy array of ones and zeros, as observations to a Bernoulli distribution.