I see what you are trying to do here, although this is not quite what I have in mind…
So I would say that these experiments are mainly to see with the current infrastructure in different backends, how can we glue different elements together to do what we want (i.e., building model, performing inference), and then how can we turn those codes into stable and easy to use API.
Take the example of theano, there is no distribution class, so we actually need to write our own distributions. We would likely face similar difficulties in Mxnet.
However, for tensorflow and pytorch, since there is distribution already implemented, the experiment would mostly be how to build a valid model using distribution. To me, this is the major step, as I have no doubt that the HMC implementation could sample an energy function (logp in our case) had it written in tf or pytorch tensor. The difficulty point is to get logp and dlogp automatically.
If you look at my experiment in pyro, eg Cell 35, pyro can extract logp and dlogp if you wrap the random variable as a function (the model). Specifically, when you call hmc_kernel = HMC(chain_gaussian2, **hmc_params), the hmc_kernel contains field of logp etc.
So for the experiment, I would say the first aim/step is to try to extract logp automatically.