I’m aware that HMC uses differential equations, integrating over a trajectory, as a core mechanism in its proposal distribution. Naturally, a differentiable function is necessary and so probability mass functions are understandably really tricky.
Increasingly, Bayesian Optimization is gaining traction within the combinatorial optimization community. Fundamentally, it uses Gaussian Process Regression to learn a smooth function over observed data points. And so the combination of predicted reward and variance around that prediction inform the acquisition function where the Bayesian Optimization implementation should exploit or explore next.
Getting back to HMC for a PMF, it crossed my mind that GP regression might allow a smooth function to be learned over the PMF’s space. And this may (or may not!) be enough to compute a gradient. My knowledge of stochastic processes does not extend to stochastic calculus, so I can’t say whether or not it’s possible/easy to compute the gradient of a GP-regression produced function. From what I have read of stochastic calculus, the emphasis is on integration (Ito’s lemma, etc.) not on differentiation. So this idea might be dead in the water.
Anyway, wanted to see, is this idea interesting/viable/already explored?
Edit: It crossed my mind that there already are a number of distributions that are analogous across the discrete/continuous paradigms; for example Bernoulli & Beta, Poisson & Gamma, Categorical & Dirichlet. And so, coupled with the above, the juice just might not be worth the squeeze when it’s often possible to identify a continuous distribution roughly analogous to the discrete one the user would prefer to use.