Impute results in mismatch dimensions in dims and data

Ah yeah you’re right, and 3 + 2 = 5, so the dims error is resulting from the _missing and _observed dimensions not being correctly created.

It seems that pm.Data doesn’t like missing values, and that arviz doesn’t like missing values in matrices. No idea if there’s a way to do this with wide data, but I think you can circumvent this by using long data. There shouldn’t be any issues if you have missing values in a vector. Something like:

velocity_long = velocity_data.to_dataframe().drop(columns='mask').reset_index()
load_long = load_data.to_dataframe().drop(columns='mask').reset_index()

exercise_idx, exercises = pd.factorize(load_long.exercise)
date_idx, date = pd.factorize(load_long.date)
exercise_date_idx, exercise_date = pd.factorize(load_long.apply(lambda x: x.exercise + '_' + x.date, axis=1))

velocity_data = velocity_long.velocity_std_masked.values
load_data = load_long.load_std_masked.values

The only tricky change was the dates. You want n_date draws from a normal distribution centered on the exercise mean – you can do this by directly passing the mean and std, then using the dimension “date” as a batch dimension:

    wide_date_intercept = pm.Normal('wide_date_intercept', mu = exercise_intercept, sigma = exercise_intercept_sd, dims = ['date', 'exercise'])
    wide_date_slope = pm.Normal('wide_date_slope', mu = exercise_slope, sigma = exercise_slope_sd, dims = ['date', 'exercise'])

So the result is a 7x3 matrix that looks like this:

[['D1E1', 'D1E2', 'D1E3'],
 ['D2E1', 'D2E2', 'D2E3'],
 ['D3E1', 'D3E2', 'D3E3'],
 ['D4E1', 'D4E2', 'D4E3'],
 ['D5E1', 'D5E2', 'D5E3'],
 ['D6E1', 'D6E2', 'D6E3'],
 ['D7E1', 'D7E2', 'D7E3']]

To make the exercise_date_idx line up, you can transpose it to 3x7 then ravel, so the result looks like this:

['D1E1', 'D2E1', 'D3E1', 'D4E1', 'D5E1', 'D6E1', 'D7E1', 'D1E2', 'D2E2', 'D3E2', 'D4E2', 'D5E2', 'D6E2', 'D7E2', 'D1E3', 'D2E3', 'D3E3', 'D4E3', 'D5E3', 'D6E3', 'D7E3']

Which matches the data, and can be indexed by exercise_date_idx to get n_observations per exercise-date pair:

    date_intercept = pm.Deterministic('date_intercept', wide_date_intercept.T.ravel()[exercise_date_idx])
    date_slope = pm.Deterministic('date_slope', wide_date_slope.T.ravel()[exercise_date_idx])

Then, since everything is already expanded, mu just becomes:

mu = date_intercept + date_slope * velocity

I’m pretty sure this is equivalent, but you should definitely double check.