# Statistical Rethinking 2nd Ed, Ch4, Exercise 4H1

Hi all,

I hope this is the right place to post my question.
I am trying to solve the question 4H1 of statistical rethinking (see here). In brief, there is a dataset with heights and weights of different people and previously a model was trained on all data avaialble. Now we want to make predictions for some weights and return the 89% HDI.

I attempted to solve the question in this way:

``````from collections import defaultdict
weights = [47,43.7,64.8,32.6,54.6]

with pm.Model() as m4_3:

a = pm.Normal("a", mu=178, sigma=20)
b = pm.Lognormal("b", mu=0, sigma=1)
sigma = pm.Lognormal("sigma", mu=0, sigma=1)
mu = a + b * (d2.weight.values - xbar)

height = pm.Normal("height", mu=mu, sigma=sigma, observed=d2.height.values)

trace_4_3 = pm.sample(1000, tune=1000)

with m4_3:
m4_3_trace = pm.sample(1000, tune=2000)
az.plot_trace(m4_3_trace)

data_4_3 = az.extract_dataset(m4_3_trace)
mu_collections = defaultdict(lambda:(list))

for w in weights:
mu_collections[w] = data_4_3["a"] + data_4_3["b"] * (w - d2.weight.mean())

for w,v in mu_collections.items():
print(w,v.mean(),az.hdi(v.values, hdi_prod=0.89))
``````

But I realized that the intervals that I am printing are probably the ones for the actual parameter mu.

Then I thought on trying to sample from the posterior the parameters a and b, inserting the weights that I want to predict, and extract heights based on these parameters.

``````weights = np.array(weights)
post_heights = []

for i in range(len(posterior_checks['posterior_predictive']['a'])):
mu_pr = m4_3_trace['posterior']['a'][i] + m4_3_trace['posterior']['b'][i]*(weights - d2.weight.mean())
sigma_pr = m4_3_trace['posterior']['sigma'][i]
post_heights.append(np.random.normal(mu_pr, sigma_pr))
post_heights = np.array(post_heights)

for i, weight in enumerate(weights):
print(weight, np.percentile(post_heights[:, i], [5.5, 100-5.5]))
``````

The code does not work beacause the shapes do not match (and I am struggling to understand how to make them, but that’s more a coding issue than pymc related, but if someone can help, more than welcome!), but my questions are:

1. Is the reasoning correct?
2. is there an easier way to make predictions once posteriors are generated?

Your reasoning is correct, your first block of code computes `mu` for each of the new observations, not the posterior predictive distribution.

The easiest way to handle this is to use a `pm.MutableData` container to hold the training data, then use `pm.set_data` to swap in the out-of-sample data. There’s an example of this here.

It indeed looks like what I am looking for. However, I am trying to replicate the example but if I use the same structure, such as:

``````with pm.Model(
coords_mutable={"obs_idx": np.arange(len(d2))},
coords={"parameter": ["intercept", "slope"]}
) as m4_3:
``````

I get `TypeError: __init__() got an unexpected keyword argument 'coords_mutable'`
I see at the begin of the page that it is used PyMC v5.6.0 which I have updated the package too. Any idea why?

What does `pm.__version__` show you?