Jesse, first of all, and again, many thanks for digging into that.
Second, I second every word about focusing on estimands rather than estimators.
My stubbornness around the MSM was originally due to research not being in vacuum and wanting to gap familiar notions (IPW) with new ones (Bayesian inference). However, I’ll admit that now it is also fueled by just wanting to figure out why it does not work.
Which leads me to the third point, I followed your implementation. It seems the first main difference is the square-rooting of the weights (which I thought was an OLS thing, and not necessary for MCMC), and the second is using pm.CustomDist
instead of pm.Potential
(I followed Junpeng’s answer here, which does work when using fixed pre-comupted weights).
However, your code does not improve on the estimation I previously got (I’m not seeing a shift towards the correct answer - 0).
A follow-up thought, since using pre-computed fixed weights does seem to work, I wonder if the computation could be split into two steps. First, fitting a Bayesian logistic regression, and calculating N_chains-times-N_iterations inverse probability weights. Second, fit a Bayesian linear regression, with each MCMC iteration weighted by the corresponding chain+iteration.
Does that make since in PyMC (or at all)? Is there a way to access every iteration? I tried to naively use the fixed non-bayesian weights with your CustomDist
logp_wls
function (passing fixed weights
instead of ipa
) and it works. So I wonder if there’s a way maybe to tell what iteration the sampler is in or alternatively, build a generator yielding the next weights and plugging it into logp_wls
.
[and yes, I see what you mean by “a lot of work for very little marginal values” ]