Hello,
Thanks a lot for your help.
What did the trick for me was reparametrizing the student as explained here.
All the main parameters now have a rhat at 1, and only a few experiments (among 600) have a rhat at 1.01.
The code translated into eight schools coordinates now looks like :
import numpy as np
import pymc as pm
import arviz as az
J = 8
y = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])
with pm.Model() as NonCentered_eight:
#base parameters
mu = pm.Normal("mu", mu=0, sigma=5)
tau = pm.HalfCauchy("tau", beta=5)
#student reparametrization for theta_tilde
nu = pm.Gamma("nu", 2,0.1)
half_nu = pm.Deterministic("half_nu",0.5 * nu)
tau_prime= pm.Gamma("tau_prime",half_nu, half_nu)
alpha= pm.Normal("alpha", 0, 1,shape=y.shape)
theta_tilde = pm.Deterministic("beta", alpha / np.sqrt(tau_prime))
#non-centered parametrization
theta = pm.Deterministic("theta", mu + tau * theta_tilde)
obs = pm.Normal("obs", mu=theta, sigma=sigma, observed=y)
prior = pm.sample_prior_predictive(samples=10000)
trace = pm.sampling_jax.sample_numpyro_nuts(tune=40000,target_accept=0.99)
posterior_predictive = pm.sample_posterior_predictive(trace)
model_summary=az.summary(trace)
model_summary
However, posterior predictive checks (on my real life usecase) show that my model under estimate the mass at or around zero.
Indeed, my actual distribution has both a peak around zero (like a normal distribution) and fat tails (like a student with low degree of freedom), while there is a trade-off in the student between those dimensions (i.e. the fater the tail the lower the mass around zero).
Thus, I may try a mixture model (point normal or student) in the coming days.
That’ll be another topic.
Best regards,
Timothée