Hi
Maybe this is a silly question, but I am quite confused.
I am following this tutorial https://docs.pymc.io/notebooks/multilevel_modeling.html from the pymc3 docs. Everything is smooth until the last part: Predictions
Basically, the author states:
Gelman (2006) used cross-validation tests to check the prediction error of the unpooled, pooled, and partially-pooled models
root mean squared cross-validation prediction errors :
…
My first question is: Could someone share some code (or nudge me in the right direction) that implements this (In the aforementioned article they use LOOCV on the houses)?
Second question: How can I calculate other metrics like MAE ? Should I use the mean as in the tutorial? It seems weird to use a point prediction for evaluation instead of the full distribution…
Third question: How can I predict new points? (The Gelman article does that too). The author of the tutorial, does something weird, it /runs/ everything adding a new point… this seems clumsy or undesirable, searching lead me to https://docs.pymc.io/notebooks/posterior_predictive.html?highlight=posterior%20predictive%20checks
There, the author suggest to use the ppc
and a theano
shared
variable to update the model. The example in that documentation works great, but when I tried to replicate that with the hierarchical model from the first tutorial I got the following error:
TypeErrorTraceback (most recent call last)
<ipython-input-11-920de590bed5> in <module>
18
19 # Expected value
---> 20 mu = pm.Deterministic('mu', alpha[clazz] + beta[clazz]*v_s_shared)
21
22 # Data likelihood
TypeError: unsupported operand type(s) for *: 'TensorVariable' and 'SharedVariable'
My code is the following:
from theano import shared
v_s_shared = shared(v_s)
import pymc3 as pm
with pm.Model() as varying_intercept_slope:
# Priors
mu_a = pm.Normal('mu_a', mu=0, sd=1e5)
sigma_a = pm.HalfCauchy('sigma_a', 5)
mu_b = pm.Normal('mu_b', mu=0, sd=1e5)
sigma_b = pm.HalfCauchy('sigma_b', 5)
alpha = pm.Normal('alpha', mu=mu_a, sd=sigma_a, shape=n_clazzes)
beta = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=n_clazzes)
# Model error
epsilon = pm.HalfCauchy('epsilon', 5)
nu = pm.Deterministic('nu', pm.Exponential('nu_', 1/29) + 1)
# Expected value
mu = pm.Deterministic('mu', alpha[clazz] + beta[clazz]*v_s_shared)
# Data likelihood
#y_pred = pm.Normal('y_pred', mu=mu, sd=epsilon, observed=log_y)
y_pred = pm.StudentT('y_pred', mu=mu, sd=epsilon, nu=nu, observed=log_y)
Hopefully, someone could help me with this… your help is really appreciated