_missing variables

In “Getting Started with PyMC3”, Case Study 2 shows an example involving missing variables.

The variable disasters is an array indexed by year and containing the number of disasters in that year, but some values are missing.

Sampling returns a trace that contains a variable disasters_missing, which I assume to be samples in the same shape of disaster, but with missing values having been imputed.

However, the plot from pm.traceplot(trace) shows disaster_missing being treated like an integer, not an array of integers. I assume this is a histogram of the average of the imputed missing values for each sampled array; is this assumption correct? I suppose another reasonable choice would be the sum of imputed values.

When the observed is a masked numpy array, the masked value will be modeled as a new random variable with *_missing as the name. The shape of the new random variable is the same as the number of missing values. So:

The shape of disasters_missing is the length of the missing values. For example if there is two years of data missing (I think in this case?) then the shape is 2.

1 Like

Thanks. I am still having trouble interpreting its trace plot, though.

I see multiple colors but I don’t know if they are referring to the two different missing values, or to the four chains I had, or both, or how. The traceplot documentation does not seem to explain that.

Based on other trace plots, I guess it is showing both the different chains and the different dimensions of the variable, but for continuous values one can tell the multiple dimensions easily as they typically form separate and distinct curves, but how can I interpret this plot?

image

Yeah the color is not the easiest to interpret in this case, here is 2 chain and displaying posterior samples for 2 missing value.

1 Like