I want to run a weighted regression in PyMC but with a posterior distribution of weights I have precalculated on the same data.
Say I have obtained a matrix W of N observations and Q draws (i.e., draws * chains). When calculating the regression likelihood, I want to multiply the normal log-pdf of the outcome with a vector w_q (of size N) for each MCMC iteration q.
Question:
What’s the best way to pass each MCMC iteration with a new draw from that matrix
(assume Q matches and enough weights were drew, i.e. MCMC iterations \leq Q)?
Nice to have: Ideally, I would like to hack the model definition, rather than the optimizer (step function?). So it would look something like the below, where, w is some way to fool a distribution object to spit out predefined vectors from matrix W in order.
Create a CustomDist for the y_obs with a custom logp function that draws from a python generator.
Create a similar logp function that goes into the pm.Potential for the y_obs.
Create a CustomDist for w which has a generator for both logp and random.
But these seem to use only a handful (1-10) draws from the weights matrix (I keep track with an internal counter), instead of throughout the sampling.
However, my knowledge in the inner-works of pymc is very limited, I don’t always understand what I try
Although slightly unfortunate as the solution will probably be less generic and harder to understand
I skimmed the NUTS/BaseHMC/ArrayStepShared code, it’s quite complex and there is relatively little documentation/examples on custom steps in pymc5.
Do you have a sense what the minimal change could possibly be?
Where should I intervene? on step? astep? _hamiltonian_step?
How will the custom step be incorporated in the pm.Model()?