Applying an MCMC Step to Sample from Distribution with KL Divergence Term

I attempted and failed to get DensityDist to work properly. I think I am just failing to wrap my head around Aesara and the ins and outs of TensorVariables, RandomVariables, ShareVariables, Functions, etc.

The code for my attempt is here. Lines marked with “!!!” are the ones that are causing confusion. In pseudocode, what I’m trying to do is essentially

def logp(mu, rho):
    q.mu.set_value(mu)
    q.rho.set_value(rho)
    kl = KL(q)
    lam = ... # a hyperparameter set elsewhere
    
    log_det_fisher = -2 * at.sum(at.log(at.diag(at.slinalg.cholesky(q.cov))))
    # Equation (10) from our paper
    return 1/2 * log_det_fisher - lam * kl.apply(f=None)

... 

with pm.Model() as mixing_distribution:
    pm.DensityDist("theta", # theta = variational params (could be called phi instead)
        dist_params=tuple(p.get_value() for p in self.q.params),
        logp = logp, 
        ...)
    mixture = pm.sample(..., model=mixing_distribution)

Here’s what I think the problems with this are, but please let me know if I’m way off base!

  1. q.mu.set_value and q.rho.set_value aren’t working the way I expect
  2. DensityDist expects logp to return a tensor, but I am returning an aesara function
  3. I may be confused about the distinction between dist_params and theta
  4. I’m violating the pymc functional style by storing q, lam, and kl as instance variables of an object (this is not shown explicitly here but is how I approached it in the more complete attempt)

Thank you in advance for any further guidance!