Just a couple more comments:
- I kind of messed up with the horseshoe prior. Thinking about this a bit more (and looking at random draws from a horseshoe) I don’t think anymore that this is a good idea. The horseshoe really is very sparse. Instead, maybe a student-t or laplace makes more sense. But it doesn’t actually seem to make much difference in the estimates.
- Reasonable people might disagree about this, but I don’t think that wide priors are in a sense a “safe option” that you can use to avoid making assumptions. They are assumptions too, and often very unrealistic assumptions. A wide prior on a scale parameter for example also says that you think that the scale is probably large, after all, most mass is far from 0. In my opinion it is usually better to fit models with different reasonable priors and compare how different choices affect the posterior. As an example of how you could do that I wrote a little notebook. I’m not sure how useful the interpretation of the prior as “belief” really is, I usually tend to think about priors more as part of the model – a way to add regularization. I guess you could also think about them in terms of sampling distributions of parameters across different experiments. But just from personal experience there is a cost to trying to be “objective” and use wide priors, this often has unexpected and nasty consequences.
- You seem to assume that you need to pay for more complicated models by needing more data. I don’t see why this would be the case. Overfitting doesn’t occur the same way as it does in many classical approaches. You can still get something similar to overfitting if you use bad priors (often priors that are to wide and don’t provide regularization
) or if your choice of model depends on the observed dataset (which is usually the case to be fair). If you were to fit a simple ols model like y ~ 1 + subject + alteredto the dataset I would bet it would tell you thataltered > 0withp << 0.05. (I haven’t actually tried…). My interpretation of the posterior of the hierarchical model would be: There is definitely an effect on some of the subjects, but it is not entirely clear how the population mean is affected – although it looks like the population mean might be increased slightly. The second seems to me a more accurate way to interpret that dataset, it doesn’t somehow “require more data”. - About the problem of finding good samples sizes and experimental designs: I don’t know of a good solution to that problem. I usually do a rough frequentist power analysis and then simulate a couple of datasets, and look at the posteriors to better understand what might or might not work. This is hardly satisfying, and so far it never worked out the way I expected. Some part of the model I wrote down beforehand always turned out to be nonsense, or my guesses for some parameters turned out to be far from ideal. The paper you mentioned sounds interesting, I’ll have a look at that when I have the time.