GLM: Hierarchical Linear Regression - Posterior Predictive Check - RMSD

In the 2nd example notebook GLM: Hierarchical Linear Regression, the author states that the model with the lowest RMSE is the one that best explains the data. Then, he mentions, without providing the code, that:

  • individual/non-hierarchical model: 0.13
  • hierarchical model: 0.08

In my interpretation, these values should be calculate with the following code:

with unpooled_model:
    unpooled_ppc = pm.sample_posterior_predictive(unpooled_trace)

unpooled_rmse = ((unpooled_ppc['y'] - data.log_radon.values) ** 2).mean() ** (1/2)

with hierarchical_model:
    hierarchical_ppc = pm.sample_posterior_predictive(hierarchical_trace)

hierarchical_rmse = ((hierarchical_ppc['radon_like'] - data.log_radon.values) ** 2).mean() ** (1/2)

However, doing that generates unpooled_rmse and hierarchical_rmse approximately the same, around 1.02 (which are both very different than the result the author presents).

What am I missing? Can someone point me to the right direction?


Hi :slight_smile:
I’m here to refloat this message, can anyone help us to get the RMSD code that is not shown in the example GLM: Hierarchical Linear Regression — PyMC3 3.10.0 documentation? (It is in the Posterior Predictive Chekcs area.)


OK, I think I found it here: Probabilistic Matrix Factorization for Making Personalized Recommendations — PyMC3 3.10.0 documentation

1 Like

I’m one of the authors, sorry for the late reply, hadn’t seen this thread.

As it’s been quite a while since I wrote this I don’t remember how exactly we did it and why we didn’t include the code. Intuitively I’d do it just like @carlossouza did but it’s surprising that it’s not on the same scale.

I would also make sure to use the non-centered version: Why hierarchical models are awesome, tricky, and Bayesian — While My MCMC Gently Samples so that there are no convergence issues affecting this.

Ultimately I wouldn’t be surprised if, after rerunning this years later, the results change slightly and maybe the hierarchical one does not beat the flat one. Any reason this is important to you?

The hierarchical model in the notebook has clearly not converged, so if your samples look like that I wouldn’t trust much any of its results. A Primer on Bayesian Methods for Multilevel Modeling — PyMC3 3.10.0 documentation should have the centered version of that same model and it converges.

There are also other things to update and improve, I’ll keep track of those at GLM hierarchical · Issue #80 · pymc-devs/pymc-examples · GitHub in case anyone seeing this is interested in updating the notebook.

1 Like