Predictions and evaluation of linear regression


Maybe this is a silly question, but I am quite confused.

I am following this tutorial from the pymc3 docs. Everything is smooth until the last part: Predictions

Basically, the author states:

Gelman (2006) used cross-validation tests to check the prediction error of the unpooled, pooled, and partially-pooled models

root mean squared cross-validation prediction errors :

My first question is: Could someone share some code (or nudge me in the right direction) that implements this (In the aforementioned article they use LOOCV on the houses)?

Second question: How can I calculate other metrics like MAE ? Should I use the mean as in the tutorial? It seems weird to use a point prediction for evaluation instead of the full distribution…

Third question: How can I predict new points? (The Gelman article does that too). The author of the tutorial, does something weird, it /runs/ everything adding a new point… this seems clumsy or undesirable, searching lead me to
There, the author suggest to use the ppc and a theano shared variable to update the model. The example in that documentation works great, but when I tried to replicate that with the hierarchical model from the first tutorial I got the following error:

TypeErrorTraceback (most recent call last)
<ipython-input-11-920de590bed5> in <module>
     19     # Expected value
---> 20     mu = pm.Deterministic('mu', alpha[clazz] + beta[clazz]*v_s_shared)
     22     # Data likelihood

TypeError: unsupported operand type(s) for *: 'TensorVariable' and 'SharedVariable'

My code is the following:

from theano import shared

v_s_shared = shared(v_s)

import pymc3 as pm

with pm.Model() as varying_intercept_slope:
    # Priors
    mu_a = pm.Normal('mu_a', mu=0, sd=1e5)
    sigma_a = pm.HalfCauchy('sigma_a', 5)
    mu_b = pm.Normal('mu_b', mu=0, sd=1e5)
    sigma_b = pm.HalfCauchy('sigma_b', 5)

    alpha = pm.Normal('alpha', mu=mu_a, sd=sigma_a, shape=n_clazzes)
    beta = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=n_clazzes)

    # Model error
    epsilon = pm.HalfCauchy('epsilon', 5)
    nu = pm.Deterministic('nu', pm.Exponential('nu_', 1/29) + 1)

    # Expected value
    mu = pm.Deterministic('mu', alpha[clazz] + beta[clazz]*v_s_shared)

    # Data likelihood
    #y_pred = pm.Normal('y_pred', mu=mu, sd=epsilon, observed=log_y)
    y_pred = pm.StudentT('y_pred', mu=mu, sd=epsilon, nu=nu, observed=log_y)

Hopefully, someone could help me with this… your help is really appreciated

Your code seems fine to me and I have never seen this error before - could you share your model with data in a notebook?

The Gelman 2006 is likely a bit outdated now, LOOCV has improved in recent years, you can use pm.loo which is the up to dated version.

See above, basically dont use MAE and use LOO or WAIC instead (there some introductory doc in

pm.sample_posterior_predictive is the right way to go.

1 Like

The error you’re seeing is probably caused by v_s's type. If it’s a numpy.array, I think that it is mostly likely of dtype object, and that’s why theano ends up raising a TypeError when you multiply it with beta. If it’s not a numpy.array, then maybe theano does not understand how to make it a shared tensor.

Tensor types sometimes raise errors that are hard to pin down. For example the mentioned in this post was caused because a numpy.array with dtype=object was converted to shared.

1 Like

Thank you for your answer, but the type of v_s is dtype('float64')

Thank you for your suggestions. Could you show me an example of use of pm.sample_posterior_predictive?

see eg

Also, I am running pymc 3.5, and sample_posterior_predictive is not there… (only pm.ppc)

The two are the same - we renamed it for pymc 3.6

You were right!

v_s type was pandas.Series, so I changed the code to

v_s_shared = shared(v_s.values)

That solved that problem

1 Like

Great! Happy to hear that the problem was solved like that

So, I will update :slight_smile: