I am defining the model using MutableData and setting the dimensions and coords within each data container. I was trying to follow PyMC 4.0 with labeled coords and dims — Oriol unraveled.
with pm.Model() as base_model:
# --- data containers ---
fs_ = pm.MutableData(name="fs", value=fm.values, dims= ("date", "fourier_mode"), coords = {"date": mmm_dat['date'].values, "fourier_mode": np.arange(2 * n_order)})
adstock_sat_media_ = pm.MutableData(name="asm", value = media_scaled, dims= ("date", "channel"),coords = {"date": mmm_dat['date'].values, "channel" : np.arange(len(media_variables))} )
target_ = pm.MutableData(name="target", value=target_scaled, dims = "date", coords = {"date": mmm_dat['date'].values})
t_ = pm.MutableData(name="t", value=t_scaled, dims = "date",coords = {"date": mmm_dat['date'].values} )
When I make out of sample predictions using the same size data (i.e. the exact training data), sample_posterior_predictive works as expected. However, when i attempt to change the input variables to be different sizes - fewer rows - 20 from 104 as follows:
test_coords = {"date": new_mmm_dat['date'], "fourier_mode": np.arange(2 * n_order), "channel" : np.arange(4) }
pm.set_data({"asm": new_media_scaled, "t": new_t_scaled, "fs": new_fm.values},model = base_model, coords=test_coords)
pred_oos = pm.sample_posterior_predictive(trace = base_model_trace, model = base_model, predictions=True, extend_inferencedata=False, random_seed=rng )
I get the following error:
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along
dimension 1, the array at index 0 has size 20 and the array at index 1 has size 21
All the new input data have 20 rows. I also see that when the function sample_posterior_predictive is called, instead of showing the likelihood being sampled, another coeff from the model is as well:
Sampling: [b_fourier, likelihood]
Questions:
- How can I produce out of sample predictions?
- Can sample_posterior_predictive return not just the likelihood but can it return intermediate values (deterministic) from the model generated with the new data?