Comparing minimum ESS rates across different models

It is common to use minimum ESS/time (effective sample size) across all parameters as a summary for MCMC sampler efficiency. Suppose two similar models M_1 and M_2 both use parameters \alpha, but differ in others i.e. only M_i uses \beta_i. Is it sensible to compare min ESS/time where the minimum is taken over all parameters? Or is it more sensible to only take the minimum over \alpha only? Or is any comparison fundamentally flawed since the models are different?

Two specific examples in my case:

  1. My models all involve multiple dependent Gaussian processes, but the dependency structure is different across models. Suppose that the parameters specifying the dependency structure are called GP hyperparameters. All models use the same GP variables, but the GP hyperparameters are different.
  2. The GPs are used to generated latent discrete variables, which then generate observed discrete variables. One approach marginalises out the latent discrete variables, another approach instead samples the latent discrete variables during MCMC. This is the case I care more about: is it okay if I compare minimum ESS rates where the minimum is taken over GP variables only, since that is what is shared between the two approaches?

I guess it depends on what you mean by “sensible”, or, how you would be trying to interpret such a comparison.

The ESS/time quantity is made up of 2 components (ESS and time). The ESS calculation will give you information about the amount of information you are getting for each parameter (e.g., the chain can very effectively explore different values of one parameter while being relatively “stuck” trying to find new values for another parameter). So the ESS of two parameter could be compared (again, depending on what you make of the comparison). However, the overall speed will be very dependent on the entire model (and your data). So differences between the ESS/time of a parameter in one model and the ESS/time of a parameter in another model may reflect differences in overall sampling speed (e.g., samples per second). So if one entire model + data reflects a more difficult geometry to sample from, overall sampling will be slowed and it will be difficult to compare the ESS/time of any parameter in that model to a similar parameter in a different, more easily sampled model.