I have a dataset (OECD country data) that has missing values. I’m making a model with a transformed variable (percentages transformed into 0 to 1.0, then centered at zero), with observations (some missing).
I made a pm.Deterministic
to let me read off the actual values by inverting this transformation (multiplying by 100 and adding k):
oecd_soc_spend = pm.Normal('z(OECD social spending pct)',
mu=soc_dem * soc_dem_spend_factor,
observed=(dataset['Social Welfare (pct NNI)'] - 40)/100.0)
pm.Deterministic('OECD social spending', oecd_soc_spend * 100.0 + 40.0)
But when I try to take a prior predictive sample from this distribution I get the following error:
ValueError: array is not broadcastable to correct shape
Apply node that caused the error: AdvancedIncSubtensor1{no_inplace,set}(TensorConstant{[ 1.0000e+...6176e-01]}, z(OECD social spending pct)_missing, TensorConstant{[ 0 1 ..1 262 263]})
Toposort index: 0
Inputs types: [TensorType(float64, vector), TensorType(float64, vector), TensorType(int64, vector)]
Inputs shapes: [(265,), (265,), (229,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{(i0 + (i1 * i2))}}[(0, 2)](TensorConstant{(1,) of 40.0}, TensorConstant{(1,) of 100.0}, z(OECD social spending pct))]]
That 229
in the inputs shapes made me suspicious, and when I render the graph, I see that the observed node has as parent a node labeled: z(OECD social spending pct)_missing ~ NoDistribution
and that this node is a vector of length 229. 229 is the number of true values in the masked array that is the observations
of the variable "z(OECD social spending pct)"
.
What am I doing wrong here?
Thanks!