_missing variables

rodrigobraz · November 19, 2018, 3:18am

In “Getting Started with PyMC3”, Case Study 2 shows an example involving missing variables.

The variable disasters is an array indexed by year and containing the number of disasters in that year, but some values are missing.

Sampling returns a trace that contains a variable disasters_missing, which I assume to be samples in the same shape of disaster, but with missing values having been imputed.

However, the plot from pm.traceplot(trace) shows disaster_missing being treated like an integer, not an array of integers. I assume this is a histogram of the average of the imputed missing values for each sampled array; is this assumption correct? I suppose another reasonable choice would be the sum of imputed values.

junpenglao · November 19, 2018, 5:45am

When the observed is a masked numpy array, the masked value will be modeled as a new random variable with *_missing as the name. The shape of the new random variable is the same as the number of missing values. So:

The shape of disasters_missing is the length of the missing values. For example if there is two years of data missing (I think in this case?) then the shape is 2.

rodrigobraz · November 19, 2018, 7:26pm

Thanks. I am still having trouble interpreting its trace plot, though.

I see multiple colors but I don’t know if they are referring to the two different missing values, or to the four chains I had, or both, or how. The traceplot documentation does not seem to explain that.

Based on other trace plots, I guess it is showing both the different chains and the different dimensions of the variable, but for continuous values one can tell the multiple dimensions easily as they typically form separate and distinct curves, but how can I interpret this plot?

junpenglao · November 19, 2018, 8:16pm

Yeah the color is not the easiest to interpret in this case, here is 2 chain and displaying posterior samples for 2 missing value.

Topic		Replies	Views
Plotting traces from time series Questions	4	727	May 4, 2020
Pm.traceplot not working when the observations are incomplete Questions	9	1354	April 5, 2021
Trying to impute missing categorical data v5	7	553	January 3, 2023
Impute results in mismatch dimensions in dims and data v5	6	1517	September 28, 2022
Y-axis is missing from trace_plots in pymc3 v5 arviz	4	489	December 29, 2022

_missing variables

Related topics