Deterministic function of observed variable with missing values

rpgoldman · September 10, 2019, 3:50pm

I have a dataset (OECD country data) that has missing values. I’m making a model with a transformed variable (percentages transformed into 0 to 1.0, then centered at zero), with observations (some missing).

I made a pm.Deterministic to let me read off the actual values by inverting this transformation (multiplying by 100 and adding k):

oecd_soc_spend = pm.Normal('z(OECD social spending pct)', 
                           mu=soc_dem * soc_dem_spend_factor, 
                           observed=(dataset['Social Welfare (pct NNI)'] - 40)/100.0)
pm.Deterministic('OECD social spending', oecd_soc_spend * 100.0 + 40.0)

But when I try to take a prior predictive sample from this distribution I get the following error:

ValueError: array is not broadcastable to correct shape
Apply node that caused the error: AdvancedIncSubtensor1{no_inplace,set}(TensorConstant{[ 1.0000e+...6176e-01]}, z(OECD social spending pct)_missing, TensorConstant{[  0   1  ..1 262 263]})
Toposort index: 0
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(int64, vector)]
Inputs shapes: [(265,), (265,), (229,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{(i0 + (i1 * i2))}}[(0, 2)](TensorConstant{(1,) of 40.0}, TensorConstant{(1,) of 100.0}, z(OECD social spending pct))]]

That 229 in the inputs shapes made me suspicious, and when I render the graph, I see that the observed node has as parent a node labeled: z(OECD social spending pct)_missing ~ NoDistribution and that this node is a vector of length 229. 229 is the number of true values in the masked array that is the observations of the variable "z(OECD social spending pct)".
What am I doing wrong here?
Thanks!

rpgoldman · September 10, 2019, 4:38pm

P.S. The documentation seems clear that PyMC3 should impute values from the model for missing data and, indeed, when I remove the deterministic here from the model, everything seems to work fine. So is this a bug in the way Deterministic RVs are handled?

Topic		Replies	Views
Can't get sample_prior_predictive to work with missing values Questions	4	754	June 10, 2021
TypeError: Invalid Use of Observed Data Variable v5 modeling	9	618	October 6, 2023
Deterministic values occasionally wrong in idata v5	1	346	November 15, 2023
Problem with Deterministic variable Questions	7	1016	March 22, 2021
Prediction with new data brings shape error v5	6	548	January 2, 2024

Deterministic function of observed variable with missing values

Related topics