After fiddling with some implementation details, I managed to get this running with the full data using the Bernoulli likelihood assumed by other models. Using the Bernoulli likelihood restored the expected behavior wrt the other analyses.
However, it has a huge drawback, in that it becomes computationally very expensive to compute the regression equation over 160k observations when including subjects. (~4 coefficients -> 41 coefficients; on my 16-core AMD, this takes close to a week to tune and take 8k samples).
A couple follow up questions.
- Why is there a big, qualitative difference between Bernoulli and binomial likelihoods here? I had intuited these would be mathematically equivalent, but it seems they’re not interchangeable when sampling.
- The binomial n varies by trial in my experiment - is that potentially an issue?
- I also found when computing the regression equation, it was actually more efficient to write out the dot product multiplication-sum pattern explicitly instead of using
theano.Tensor.dot. The latter would give me memory errors when trying to run on multiple cores, and was slower on a single core. Anybody have an understanding oftheanobehavior here?