Dealing with 1 missing observation

I am experimenting with a logistic regression that has missing values in the predictor. As suggested here (and elsewhere) I am masking the missing values with a numpy masked array. In this way I can predict what the values for the missing data were.

Everything works fine, but if there is only 1 missing observation Theano complains:

TypeError: Cannot convert Type TensorType(int64, vector) (of Variable obs_t_minus_1_missing_missing_shared__) into Type TensorType(int64, (True,)). You can try to manually convert obs_t_minus_1_missing_missing_shared__ into a TensorType(int64, (True,)).

If I have, for example, 2 missing observations then everything works fine.
Do you know how I can solve this? I am not really at ease with Theano, unfortunately.
I have set up a notebook here, if you are curious.

the part of the error TensorType(int64, vector) into Type TensorType(int64, (True,)) might hint at some data type related issues. (I sometimes find it tricky to follow the internal numpy array dtypes.)

print(obs_t_minus_1.dtype)
print(obs_t_minus_1_ma.dtype)

both return int64. Shouldn’t they be boolean?

mmm… When I have 2 missing data points (and everything works fine) the dtype is integer. I do not think that’s the problem. I feel like the issue is related to the size of the tensor being (1, ) instead of (1, 1). These weird errors happened to me before with numpy. That’s why I tend to use vector with 2 dimensions (with the function atleast_2d). I think the fix needs to be done in the PyMC3 code rather than in my code.

It’s quite likely a pymc3 / theano bug. Could you please file an issue on github.

Just to note that this is fixed in PyMC3 with this PR. This is not in a release yet though.

3 Likes