Where can I find some successful examples for inferring a large number of Bernoulli variables?
A large part of my model (A) is a lot (about 1000) of Bernoulli distributions. The remainder of the model (B) consists of continuous distributions. If I fix the parameters in (A) to the true value, inferring parameters in (B) with NUTS works very well. The mixing time is super fast. But the mixing time is horribly slow when I try to infer the whole model (A + B) with NUTS.
I tried to use expectation maximization to update the parameters in (B) and use NUTS to update (A). The mixing time of (A) slows considerably. The mixing time for (B) is still horribly slow.
Binary metropolis seems the next thing to try for (B). But the manual warned me about mixing NUTS with another sampler (section “Issues with mixing discrete and continuous sampling”)
Given the continuous parameters (B), all the Bernoulli (in A) are independent from each other.
Each Bernoulli depends on only one of the continuous parameters.
Can I further marginalize out the Bernoulli?
Right now, I fit the model with data simulated from the model.
When I think about this again, I doubt if the Bernoulli in the actual data are independent from each other.
I might have to run principal component analysis (PCA) on the actual data before feeding it into the model. But using PCA would cause other problems that I have to think through.
What is the result of projecting the Bernoulli onto the eigen-vectors that I get form PCA? That looks like a linear combination of the Bernoulli variable. But what distribution does the linear combination follows in this case?
Would that make the distribution less discrete and make the situation easier for pymc?