Basically I’m fitting a Bayesian logistic regression model, with about 100k observations, and 15 independent variables. The process of sampling the posterior distribution of the coefficients is slow. Is there any way to do it faster?
Here is the code for the sampling:
@cluhmann sorry to interfere here, but unless they have put some new vodoo into the “more modern” MC algorithms, Metropolis should actually be superior in terms of speed (given close enough priors). You are right, it is certainly not a good choice, but the question was for speed.
@jsh980210 in my experience slow sampling is an indication of an over-complex or even wrong model design. I remember having the issue when defining model components which would contribute to the same slope; the model would randomly distribute values between them and never converge.
So in short, your general question can not be generally answered. Please provide an example (and, for me at least, the process of creating a minimal example is most instructive).
Also, sampling speed depends on your hardware and software. So you might consider sharing vague/approximate details about your setup so that the devs can exclude that as a bottleneck.
Why do you say that? Metropolis is infamous for generating very auto-correlated draws and taking many more steps (and logp evaluations) to achieve the same number of effective sample size as modern alternatives like NUTS that can exploit gradient information.
Perhaps you meant custom Gibbs samplers that can take random draws from the conditional posterior distributions without any rejection steps?
Pure applied coding: when I (used to) set Metropolis in well-defined test cases (back in the days when there were two PyMC’s existing in parallel, you know), back then, Metropolis would go faster. But maybe I am even wrong about that. And apologies if writing “voodoo” offended, I as a frequent user are truly amazed by the improvements in sampler methods and their implementation in pymc, by you guys! Thanks for that!
But discussing samplers was not the point of the OP. Right? I just found the Metropolis response a bit blunt, and thought we might be able to help more if we ask for more info.
Certainly computing one step of the Metropolis algorithm is cheaper than computing one step of NUTS, but you can’t extrapolate this fact to a statement about relative speeds. Doing so ignores the total number of evaluations required to get a posterior estimate. For a fixed target ESS, Metropolis may (and does!) require hundreds or thousands of times more logp evaluations than NUTS, so the result is that the more computationally expensive NUTS evaluations will end up actually saving time per effective sample.
It’s true that one needs to be much more specific about exactly what kind of speed is desired. I can give you a zillion samples instantly if you’re happy with them all being \theta=-7.1285. But the speed seen in simpler sampling algorithms is often illusory: more samples per unit time, but less information per sample (and often less information per unit time). There are certainly situations in which it makes sense to strategically select sampling algorithms, but it should be done with caution and is generally not a profitable way to find “speed”.
Apologies if my initial comment was blunt. I was trying to be quick but it may have come off as dismissive. My general advice is to (almost) never specify step functions by hand because it often hurts performance relative to the pymc-selected default step functions (and thus seemed relevant to the original request).