How to make out of sample predictions with categorical variables? Implementation question from Rethinking 2 Book

Hi how’s it going? I’m going through Rethinking 2, and was wondering if you can help me finish this code in terms of making out of sample predictions.

Here is my simple example of what I have so far…

data = pd.DataFrame({'var1':np.random.random_sample(12),'cat_var':['a','b']*6,'y':[np.random.randint(5,20) for val in range(12)]})

Screenshot from 2020-07-22 16-27-48

data = pd.get_dummies(data,columns=['cat_var'])
cid = pd.Categorical(data["cat_var"])

with pm.Model() as m_5_1:
    a = pm.Normal("a", 1, 0.1, shape=cid.categories.size)
    b = pm.Normal("b", 0, 0.3)
    sigma = pm.Uniform("sigma", 0,1)
    mu = pm.Deterministic("mu", a[cid] + b * data['var1'])
    y = pm.Normal(
        "y",mu=mu, sigma=sigma, observed=y.values
    )
    trace = pm.sample()

Now if I were to just make out of sample predictions without the categorical variable, I would do:

new_data_0 = xr.DataArray(
    newdata['var1'],
    dims=["pred_id"]
)

pred_mean = (
    trace["a"][:newdata.shape[0]] +
    trace["b"][:newdata.shape[0]] * new_data_0 
)

rng = np.random.default_rng()
predictions = xr.apply_ufunc(lambda mu, sd: rng.normal(mu, sd), pred_mean, trace["sigma"][:newdata.shape[0]])

How would I add the categorical variable from new unseen data into this xr datatype format for predictions? In this case 0 for the letter “a”" and 1 for the letter “b”.

Thank you!

Hi,
I think you can use a combination of pm.Data and pm.sample to do that – more PyMC-idiomatic.
The repo of the port of Rethinking_2 to PyMC should have all the help you need :wink:
PyMCheers :vulcan_salute: