Here’s a bit of motivation for what prompted me to ask this.
I’m working with a model that has a parameter that takes positive values. Using a positive/nonnegative prior with support on [0, inf), pymc will log-transform it for the sampler.
This is a black box model without gradient information
so we use a more traditional step method. Let’s say metropolis-hastings with a uniform proposal to keep it simple. What I am struggling to think about is how the change in scale (due to the log transform) will affect the efficiency of the sampler’s proposals.
Suppose the proposal width is 1 on the log scale. At the lower range of parameter values, this means that the M-H step has a range of e^{1.01}-e^{0.01} \approx 1.73 in original units in terms of how far away the two furthest possible proposals are. But suppose we begin the M-H step a bit further up the number line at 6.5. Then the the M-H step has a range of e^{7}-e^{6} \approx 693.2 in the original units. Exponential growth is fast!
Perhaps this makes it easier to explore the posterior close to zero. But I’ve also run into an issue with this in some models. Sometimes the step method (metropolis or slice) will propose values that are so large (in original units) that they result in numerical issues or exceptions when passed to the black-box likelihood.
Now, perhaps this could be addressed by setting a tighter prior over the positive parameter. And perhaps one could also modify the black-box likelihood to return logp=-inf or so if it encounters really extreme parameter values. However, the issue is also prevented by disabling the log-transform, and instead sampling on the original scale.
Is this a case where the untransformed space could be preferable? What would I be missing out on?
n.b. I am still trying to make a minimum working example that only uses pymc and demonstrates the extreme parameter value issue, but I haven’t yet gotten it…
A related example that also gets at what I am grappling with is: perhaps I know that a reasonable proposal distribution for my particular problem in original units is U(-b, b). But I don’t know of any good way to directly translate this knowledge to the log-transformed regime without it potentially getting very distorted.