Row to column ratio in bayesian models

Hi people,
what should be the minimal row to column ratio in data for bayesian models.
I have heard some places that it should be atleast 10:1.
Can any of you comment on this with evidence from any paper or article.
Thanks

Assuming you are talking about a data matrix x of covariates for regression (e.g., linear or logistic), there’s no minimum per se in either Bayesian approaches or penalized maximum likelihood (e.g., ridge or lasso regression). What you will find in Bayesian models is that more data reduces posterior uncertainty. What is even more common is to find large numbers of groups organized into a hierarchical model.

Rather than a reference, you can simulate and see what happens when you know the ground truth. Even with fewer observations than covariates (N < P in statistics terminology, where N is what you are calling rows and P is columns or number of covariates), you should get wide but calibrated posterior intervals.

I’m not sure what you’re looking for in terms of references. If you want a deep dive, check out Gelman et al.'s Bayesian Data Analsysis on the Bayesian side and Hastie, Tibshirani and Friedman’s text on the frequentist side for discussions. An easier place to start is Gelman and Hill’s regression book (the new version is only the first half—they haven’t finished the second half revision yet). One of Gelman and Rubin’s favorite examples is eight schools, where you get both means and variances from measurement in each of eight schools. It then leads to discussions of priors. See chapter 5 of BDA for hierarchical models. There’s a free pdf on the book’s home page.

2 Likes