GLM: Hierarchical Linear Regression - Posterior Predictive Check - RMSD

carlossouza · August 21, 2020, 1:43am

In the 2nd example notebook GLM: Hierarchical Linear Regression, the author states that the model with the lowest RMSE is the one that best explains the data. Then, he mentions, without providing the code, that:

individual/non-hierarchical model: 0.13
hierarchical model: 0.08

In my interpretation, these values should be calculate with the following code:

with unpooled_model:
    unpooled_ppc = pm.sample_posterior_predictive(unpooled_trace)

unpooled_rmse = ((unpooled_ppc['y'] - data.log_radon.values) ** 2).mean() ** (1/2)

with hierarchical_model:
    hierarchical_ppc = pm.sample_posterior_predictive(hierarchical_trace)

hierarchical_rmse = ((hierarchical_ppc['radon_like'] - data.log_radon.values) ** 2).mean() ** (1/2)

However, doing that generates unpooled_rmse and hierarchical_rmse approximately the same, around 1.02 (which are both very different than the result the author presents).

What am I missing? Can someone point me to the right direction?

Thanks!

EduardoCabria · April 15, 2021, 9:25am

Hi
I’m here to refloat this message, can anyone help us to get the RMSD code that is not shown in the example GLM: Hierarchical Linear Regression — PyMC3 3.10.0 documentation? (It is in the Posterior Predictive Chekcs area.)

Thankyou

EduardoCabria · April 15, 2021, 9:55am

OK, I think I found it here: Probabilistic Matrix Factorization for Making Personalized Recommendations — PyMC3 3.10.0 documentation

twiecki · April 15, 2021, 10:32am

I’m one of the authors, sorry for the late reply, hadn’t seen this thread.

As it’s been quite a while since I wrote this I don’t remember how exactly we did it and why we didn’t include the code. Intuitively I’d do it just like @carlossouza did but it’s surprising that it’s not on the same scale.

I would also make sure to use the non-centered version: Why hierarchical models are awesome, tricky, and Bayesian — While My MCMC Gently Samples so that there are no convergence issues affecting this.

Ultimately I wouldn’t be surprised if, after rerunning this years later, the results change slightly and maybe the hierarchical one does not beat the flat one. Any reason this is important to you?

OriolAbril · April 15, 2021, 12:38pm

The hierarchical model in the notebook has clearly not converged, so if your samples look like that I wouldn’t trust much any of its results. A Primer on Bayesian Methods for Multilevel Modeling — PyMC3 3.10.0 documentation should have the centered version of that same model and it converges.

There are also other things to update and improve, I’ll keep track of those at GLM hierarchical · Issue #80 · pymc-devs/pymc-examples · GitHub in case anyone seeing this is interested in updating the notebook.

Topic		Replies	Views
Prediction using sample_ppc in Hierarchical model Questions from_github	6	4973	December 14, 2017
Hierarchical regression models for ratings data ( 2 by 2 within-subject design) Questions	3	1741	December 12, 2019
Baysian hirerchical Linear Regression(Partial Pooling) model using PYMC version agnostic	13	222	November 24, 2024
Hierarchical Linear Regression - Simplest Possible Example Questions	2	693	December 30, 2020
Predictions and evaluation of linear regression Questions	11	1821	November 22, 2018

GLM: Hierarchical Linear Regression - Posterior Predictive Check - RMSD

Related topics