Hi all,
I hope this is the right place to post my question.
I am trying to solve the question 4H1 of statistical rethinking (see here). In brief, there is a dataset with heights and weights of different people and previously a model was trained on all data avaialble. Now we want to make predictions for some weights and return the 89% HDI.
I attempted to solve the question in this way:
from collections import defaultdict
weights = [47,43.7,64.8,32.6,54.6]
with pm.Model() as m4_3:
a = pm.Normal("a", mu=178, sigma=20)
b = pm.Lognormal("b", mu=0, sigma=1)
sigma = pm.Lognormal("sigma", mu=0, sigma=1)
mu = a + b * (d2.weight.values - xbar)
height = pm.Normal("height", mu=mu, sigma=sigma, observed=d2.height.values)
trace_4_3 = pm.sample(1000, tune=1000)
with m4_3:
m4_3_trace = pm.sample(1000, tune=2000)
az.plot_trace(m4_3_trace)
data_4_3 = az.extract_dataset(m4_3_trace)
mu_collections = defaultdict(lambda:(list))
for w in weights:
mu_collections[w] = data_4_3["a"] + data_4_3["b"] * (w - d2.weight.mean())
for w,v in mu_collections.items():
print(w,v.mean(),az.hdi(v.values, hdi_prod=0.89))
But I realized that the intervals that I am printing are probably the ones for the actual parameter mu.
Then I thought on trying to sample from the posterior the parameters a and b, inserting the weights that I want to predict, and extract heights based on these parameters.
weights = np.array(weights)
post_heights = []
for i in range(len(posterior_checks['posterior_predictive']['a'][0])):
mu_pr = m4_3_trace['posterior']['a'][i] + m4_3_trace['posterior']['b'][i]*(weights - d2.weight.mean())
sigma_pr = m4_3_trace['posterior']['sigma'][i]
post_heights.append(np.random.normal(mu_pr, sigma_pr))
post_heights = np.array(post_heights)
for i, weight in enumerate(weights):
print(weight, np.percentile(post_heights[:, i], [5.5, 100-5.5]))
The code does not work beacause the shapes do not match (and I am struggling to understand how to make them, but that’s more a coding issue than pymc related, but if someone can help, more than welcome!), but my questions are:
- Is the reasoning correct?
- is there an easier way to make predictions once posteriors are generated?
Thank you in advance!