Problem with imputation of missing data for a Bernoulli distribution

mattpitkin · January 6, 2019, 10:24pm

I’m trying to test out some simple imputation of missing observed values with a Bernoulli distribution and hit a theano problem, and was wondering if anyone had any ideas about solving it, or if it’s a theano bug. I’m using PyMC3 version 3.6 and theano version 1.0.3. A simple version of my code is as follows:

import pymc3 as pm
from scipy.stats import bernoulli 

# set "true" probability of rain
true_rain = 0.41

# set number of previous "observations"
nobs = 1000

# set the observations
has_rained = bernoulli.rvs(true_rain, size=nobs)

# try subsituting in a miss sample
has_rained[20] = -1  # add missing samples as -1
has_rained = np.ma.masked_values(has_rained, value=-1)  # create masked array

with pm.Model() as model:
    prain = pm.Uniform('prain', 0.0, 1.0)  # prior on probability of rain

    # distribution of prain given the number of observed times it has rained
    rain = pm.Bernoulli('rain', p=prain, observed=has_rained)

    trace = pm.sample(2000, tune=6000, discard_tuned_samples=True, chains=2)

The final lines of the error message that this produces are:

~/.conda/envs/survival/lib/python3.6/site-packages/theano/tensor/type.py in 
filter_variable(self, other, allow_convert)
    232             dict(othertype=other.type,
    233                  other=other,
--> 234                  self=self))
    235 
    236     def value_validity_msg(self, a):

TypeError: Cannot convert Type TensorType(int64, vector) (of Variable 
rain_missing_shared__) into Type TensorType(int64, (True,)). You can try to manually 
convert rain_missing_shared__ into a TensorType(int64, (True,)).

I can only assume that this is failing due to an issue with the Bernoulli distributions use of integer or boolean types, as this isn’t a problem that is noted in this example.

I also see the same error if trying to pass a theano shared variable, created from a numpy array of ones and zeros, as observations to a Bernoulli distribution.

junpenglao · January 6, 2019, 10:45pm

Yes, there is an issue with masking only 1 value: https://github.com/pymc-devs/pymc3/issues/3122

Unfortunately, we dont currently have a fix yet…

mattpitkin · January 7, 2019, 5:00pm

Thanks, I’ll have a think about whether this might be a problem for me and if I have any ideas for a fix I’ll be sure to post them on the open issue (although with my very, very limited theano knowledge I doubt I’ll be much help!)

mattpitkin · January 9, 2019, 9:54pm

I’ve just posted a potential fix for this here. It just involves adding the lines

if isinstance(var.tag.test_value, np.ndarray):
    if len(var.tag.test_value) == 1:
        shared.type = theano.tensor.TensorType(var.dtype, (True,))

in model.py after this line.

mattpitkin · January 15, 2019, 10:45pm

This is fixed in PyMC3 with this PR. This is not in a release yet though.

Topic		Replies	Views
Bernoulli Distribution issues when using Data class Questions	4	651	March 5, 2021
Dealing with 1 missing observation Questions	4	601	January 15, 2019
Strange error with Categorical distribution Questions	3	469	August 15, 2018
Issues with theano in a python code used to generate mu in a Gaussian model v3 theano	3	550	April 20, 2022
Value error of pm.model Questions	8	597	May 28, 2019

Problem with imputation of missing data for a Bernoulli distribution

Related topics