Myeah no, I’m just getting a noisy average, all the samples look identical. I can also see the hyper_tau's estimate is way too low. So, I don’t know if this is only an error during inference, or also during posterior sampling.
The above model, when run with a subset of the data (removing any total_size) works wonderfully in both NUTS and ADVI. The means of the samples on shifts have impressive precision (pearsonr of 0.94 on a subset of 128 subjects), and the hyper parameters are well recovered.
As soon as I switch to minibatch, the standard deviation of the shifts reaches near-0 levels. The mean is about ok. The fit between sampled posteriors for a sample and their real values gives a pearsonr of 0.
So I’m guessing minibatches either doesn’t play well with hierarchical models, or I’m not doing it right.