Statistical Rethinking 2nd Ed, Ch4, Exercise 4H1

drBayes · September 15, 2023, 12:19pm

Hi all,

I hope this is the right place to post my question.
I am trying to solve the question 4H1 of statistical rethinking (see here). In brief, there is a dataset with heights and weights of different people and previously a model was trained on all data avaialble. Now we want to make predictions for some weights and return the 89% HDI.

I attempted to solve the question in this way:

from collections import defaultdict
weights = [47,43.7,64.8,32.6,54.6]


with pm.Model() as m4_3:
    
    a = pm.Normal("a", mu=178, sigma=20)
    b = pm.Lognormal("b", mu=0, sigma=1)
    sigma = pm.Lognormal("sigma", mu=0, sigma=1)
    mu = a + b * (d2.weight.values - xbar)

    height = pm.Normal("height", mu=mu, sigma=sigma, observed=d2.height.values)
    
    trace_4_3 = pm.sample(1000, tune=1000)

with m4_3:
    m4_3_trace = pm.sample(1000, tune=2000)
    az.plot_trace(m4_3_trace)

data_4_3 = az.extract_dataset(m4_3_trace)
mu_collections = defaultdict(lambda:(list))

for w in weights:
    mu_collections[w] = data_4_3["a"] + data_4_3["b"] * (w - d2.weight.mean())

for w,v in mu_collections.items():
    print(w,v.mean(),az.hdi(v.values, hdi_prod=0.89))

But I realized that the intervals that I am printing are probably the ones for the actual parameter mu.

Then I thought on trying to sample from the posterior the parameters a and b, inserting the weights that I want to predict, and extract heights based on these parameters.

weights = np.array(weights)
post_heights = []

for i in range(len(posterior_checks['posterior_predictive']['a'][0])):
    mu_pr = m4_3_trace['posterior']['a'][i] + m4_3_trace['posterior']['b'][i]*(weights - d2.weight.mean())
    sigma_pr = m4_3_trace['posterior']['sigma'][i]
    post_heights.append(np.random.normal(mu_pr, sigma_pr))
post_heights = np.array(post_heights)

for i, weight in enumerate(weights):
    print(weight, np.percentile(post_heights[:, i], [5.5, 100-5.5]))

The code does not work beacause the shapes do not match (and I am struggling to understand how to make them, but that’s more a coding issue than pymc related, but if someone can help, more than welcome!), but my questions are:

Is the reasoning correct?
is there an easier way to make predictions once posteriors are generated?

Thank you in advance!

jessegrabowski · September 15, 2023, 1:49pm

Your reasoning is correct, your first block of code computes mu for each of the new observations, not the posterior predictive distribution.

The easiest way to handle this is to use a pm.MutableData container to hold the training data, then use pm.set_data to swap in the out-of-sample data. There’s an example of this here.

drBayes · September 15, 2023, 2:35pm

It indeed looks like what I am looking for. However, I am trying to replicate the example but if I use the same structure, such as:

with pm.Model(
    coords_mutable={"obs_idx": np.arange(len(d2))}, 
    coords={"parameter": ["intercept", "slope"]}
) as m4_3:

I get TypeError: __init__() got an unexpected keyword argument 'coords_mutable'
I see at the begin of the page that it is used PyMC v5.6.0 which I have updated the package too. Any idea why?

jessegrabowski · September 15, 2023, 6:21pm

What does pm.__version__ show you?

Topic		Replies	Views
Repeating a model for two genders version agnostic	2	373	April 5, 2023
Pm.sample_posterior_predictive() not working with weights Questions	12	1316	March 23, 2023
Sample posterior check with pm.Potentials Questions modeling	4	558	March 17, 2023
Parameter Realization Weighting/Frequency modeling	7	600	April 9, 2022
How to weight observations in v4? v5	2	880	July 27, 2022

Statistical Rethinking 2nd Ed, Ch4, Exercise 4H1

Related topics