Hi all,
I am a seasoned PyMC3 user, primarily for large-scale computational biology applications, and a first time poster – excited to join this discourse!
I am bogged down by a technical issue and I appreciate any help. Let’s say I have a complicated model with many global RVs and a few local RVs. To clarify, in the context of a GMM, I would call the mean/covariance of the Gaussian components global RVs and the responsibility of each data point a local RV (not to be confused with local/global in the context of AEVB).
Generally speaking, I would like to learn the variational posterior of my model from a very large dataset via ADVI and store the posterior parameterization (mu, rho) of the global RVs for future use, such that when a new data point arrives, I would only need to perform inference on the local RVs and use my previous knowledge of the global RVs without updating their posterior any further.
One possible approach is to instantiate the same model again, set the mu and rho of the global RVs to the previously obtained values, and then somehow forbid their update in the ADVI step function. This might be doable, for instance, by excluding the parts of mu and rho that correspond to global RVs from gradient calculation, or somehow by short-circuiting their gradients to 0. I am more or less familiar with the low-level details of PyMC3 but I have not been able to figure out a simple way of doing this without creating a hacky fork branch. Any suggestion is very much appreciated, including better ways of achieving the same goal without jumping hoops!
ps1> An unsatisfying/approximate workaround is to neglect rho altogether and treat mu of the global RVs as deterministic. One can put together a simpler model where only local RVs are free and use the MAP estimate of global RVs whenever they appear. However, this makes the model over-confident of the global RVs which is certainly not desirable.
ps2> a possible workaround is passing the global RV nodes as consider_constant in theano.grad(loss_or_grads, params) in pymc3.variational.updates.get_or_compute_grads.
ps3> what about setting global RVs to constant tensors using more_replacements in ObjectiveFunction?
ps4> what about decorating the objective optimizer in such a way that only the slice of grads corresponding to the local RVs is being used? I wonder whether theano is smart enough not to waste computation time on the part of grads we ignore.
Best,
Mehrtash