Hi how’s it going? I’m going through Rethinking 2, and was wondering if you can help me finish this code in terms of making out of sample predictions.
Here is my simple example of what I have so far…
data = pd.DataFrame({'var1':np.random.random_sample(12),'cat_var':['a','b']*6,'y':[np.random.randint(5,20) for val in range(12)]})
data = pd.get_dummies(data,columns=['cat_var'])
cid = pd.Categorical(data["cat_var"])
with pm.Model() as m_5_1:
a = pm.Normal("a", 1, 0.1, shape=cid.categories.size)
b = pm.Normal("b", 0, 0.3)
sigma = pm.Uniform("sigma", 0,1)
mu = pm.Deterministic("mu", a[cid] + b * data['var1'])
y = pm.Normal(
"y",mu=mu, sigma=sigma, observed=y.values
)
trace = pm.sample()
Now if I were to just make out of sample predictions without the categorical variable, I would do:
new_data_0 = xr.DataArray(
newdata['var1'],
dims=["pred_id"]
)
pred_mean = (
trace["a"][:newdata.shape[0]] +
trace["b"][:newdata.shape[0]] * new_data_0
)
rng = np.random.default_rng()
predictions = xr.apply_ufunc(lambda mu, sd: rng.normal(mu, sd), pred_mean, trace["sigma"][:newdata.shape[0]])
How would I add the categorical variable from new unseen data into this xr datatype format for predictions? In this case 0 for the letter “a”" and 1 for the letter “b”.
Thank you!