Did you try to calculate \|\nabla_\theta\left[\log p(D|\theta) + \log p(\theta) - \log q(\theta)\right]\|_2^2 instead? Differential operator is linear and this enables this a bit more lightweight operation.
PS not yet coded this, but it seems to be a more efficient solution
- you’d better use normed tensors
diff_grad = tt.grad(self.logp_norm - self.logq_norm, self.symbolic_randoms)