Modeling bivariate beta distributions

I’ve run into this modeling problem before. It took me a long time to wrap my head around this, and I still get confused occasionally. It’s a problem I’ve been struggling with for a few months now, so I apologize if this is incorrect, too.

The model you’ve described in post #6 doesn’t have enough data to learn the correlations. Suppose you have 100 trials, and 30 successes for A and 70 for B. This says little about the probability of both trials succeeding; the following could all be the case:

fail A succeed A fail A succeed A fail A succeed A
succeed B 9 21 0 30 30 0
fail B 21 49 30 40 0 70

These all have different correlations. Using a binomial likelihood hides the correlation from the model, because all it sees are the aggregate counts (30 successes for A, 70 for B). The model does not have enough information to recover the (latent) correlation when just given the aggregates.

With all that said, I am still not totally satisfied by the above explanation. Intuitively, it seems to me that by drawing the (logit) probability p from an MvNormal, I should be specifying that there’s a possible correlation between p_a and p_b, and that the model should recover that. In practice, I think “possible” is the operant word in that sentence, and in the absence of more data the problem gets reduced to two independent Binomials.

If you have the individual trial data, you might try using that. This would be a model where you draw logit p from an MVN (or p from correlated Betas) then pass those to a Bernoulli, instead of a Binomial. I’ve had trouble fitting this kind of model in practice, though, so if you have that data and succeed I’d love to hear about it.