Predictions and evaluation of linear regression

nanounanue · November 20, 2018, 11:01pm

Hi

Maybe this is a silly question, but I am quite confused.

I am following this tutorial https://docs.pymc.io/notebooks/multilevel_modeling.html from the pymc3 docs. Everything is smooth until the last part: Predictions

Basically, the author states:

Gelman (2006) used cross-validation tests to check the prediction error of the unpooled, pooled, and partially-pooled models

root mean squared cross-validation prediction errors :
…

My first question is: Could someone share some code (or nudge me in the right direction) that implements this (In the aforementioned article they use LOOCV on the houses)?

Second question: How can I calculate other metrics like MAE ? Should I use the mean as in the tutorial? It seems weird to use a point prediction for evaluation instead of the full distribution…

Third question: How can I predict new points? (The Gelman article does that too). The author of the tutorial, does something weird, it /runs/ everything adding a new point… this seems clumsy or undesirable, searching lead me to https://docs.pymc.io/notebooks/posterior_predictive.html?highlight=posterior%20predictive%20checks
There, the author suggest to use the ppc and a theano shared variable to update the model. The example in that documentation works great, but when I tried to replicate that with the hierarchical model from the first tutorial I got the following error:

TypeErrorTraceback (most recent call last)
<ipython-input-11-920de590bed5> in <module>
     18 
     19     # Expected value
---> 20     mu = pm.Deterministic('mu', alpha[clazz] + beta[clazz]*v_s_shared)
     21 
     22     # Data likelihood

TypeError: unsupported operand type(s) for *: 'TensorVariable' and 'SharedVariable'

My code is the following:

from theano import shared

v_s_shared = shared(v_s)

import pymc3 as pm

with pm.Model() as varying_intercept_slope:
    # Priors
    mu_a = pm.Normal('mu_a', mu=0, sd=1e5)
    sigma_a = pm.HalfCauchy('sigma_a', 5)
    mu_b = pm.Normal('mu_b', mu=0, sd=1e5)
    sigma_b = pm.HalfCauchy('sigma_b', 5)


    alpha = pm.Normal('alpha', mu=mu_a, sd=sigma_a, shape=n_clazzes)
    beta = pm.Normal('beta', mu=mu_b, sd=sigma_b, shape=n_clazzes)

    # Model error
    epsilon = pm.HalfCauchy('epsilon', 5)
    nu = pm.Deterministic('nu', pm.Exponential('nu_', 1/29) + 1)

    # Expected value
    mu = pm.Deterministic('mu', alpha[clazz] + beta[clazz]*v_s_shared)

    # Data likelihood
    #y_pred = pm.Normal('y_pred', mu=mu, sd=epsilon, observed=log_y)
    y_pred = pm.StudentT('y_pred', mu=mu, sd=epsilon, nu=nu, observed=log_y)

Hopefully, someone could help me with this… your help is really appreciated

junpenglao · November 21, 2018, 6:05am

Your code seems fine to me and I have never seen this error before - could you share your model with data in a notebook?

junpenglao · November 21, 2018, 8:57am

The Gelman 2006 is likely a bit outdated now, LOOCV has improved in recent years, you can use pm.loo which is the up to dated version.

See above, basically dont use MAE and use LOO or WAIC instead (there some introductory doc in http://docs.pymc.io)

pm.sample_posterior_predictive is the right way to go.

lucianopaz · November 21, 2018, 2:06pm

The error you’re seeing is probably caused by v_s's type. If it’s a numpy.array, I think that it is mostly likely of dtype object, and that’s why theano ends up raising a TypeError when you multiply it with beta. If it’s not a numpy.array, then maybe theano does not understand how to make it a shared tensor.

Tensor types sometimes raise errors that are hard to pin down. For example the mentioned in this post was caused because a numpy.array with dtype=object was converted to shared.

nanounanue · November 22, 2018, 9:11am

Thank you for your answer, but the type of v_s is dtype('float64')

nanounanue · November 22, 2018, 9:13am

Thank you for your suggestions. Could you show me an example of use of pm.sample_posterior_predictive?

junpenglao · November 22, 2018, 9:16am

see eg https://docs.pymc.io/notebooks/rugby_analytics.html?highlight=sample_posterior_predictive

nanounanue · November 22, 2018, 9:19am

Also, I am running pymc 3.5, and sample_posterior_predictive is not there… (only pm.ppc)

junpenglao · November 22, 2018, 9:28am

The two are the same - we renamed it for pymc 3.6

nanounanue · November 22, 2018, 9:33am

You were right!

v_s type was pandas.Series, so I changed the code to

v_s_shared = shared(v_s.values)

That solved that problem

lucianopaz · November 22, 2018, 9:52am

Great! Happy to hear that the problem was solved like that

nanounanue · November 22, 2018, 5:02pm

So, I will update

Topic		Replies	Views
Prediction using sample_ppc in Hierarchical model Questions from_github	6	4973	December 14, 2017
PyMC v5.10.3 prediction stuff v5 modeling	11	458	January 15, 2024
Observed and Simulated data difference and prediction of unseen data v5	0	308	August 2, 2023
Help for building model v5 modeling	0	139	April 12, 2024
Help Interpreting Hierarchical Linear Regression Results Questions	0	588	March 1, 2021

Predictions and evaluation of linear regression

Related topics