Non-centered hierarchical model with constraints

I try to model standard deviation, which obviously must be positive, but in the centered parameterization I get a handful of divergences. I tried to naively re-parameterize with sigma = pm.Deterministic('sigma', global_sigma + offset * global_sigma_sd), and got a Bad Initial Energy error when sampling.
then in order to put a lower bound on the deterministic variable I tried with sigma.clip(0.0, np.inf) and tt.switch(sigma<0, 0.0, sigma), but again I always get a Bad Initial Energy error when sampling.
When instead of a 0-lower bound I tried 1e-15, with clip I get:

ValueError: Mass matrix contains zeros on the diagonal.

and with tt.switch:

The chain contains only diverging samples. The model is probably misspecified.

I tested with a latent variable:

offset = pm.Normal('offset', mu=0, sigma=1.0)
b_offset = pm.Bound(pm.Normal, lower=0.0)('bound_offset', mu=global_sigma + offset, sigma=1)
sigma = pm.Deterministic('sigma', b_offset * global_sigma_sd)

which gave no divergences and took 2-4 times longer to sample, but a + b * c != (a + b) * c, so that doesn’t seem like a good idea.
How can this be solved? Is the best option to transform the data, e.g. square demeaned data and log-transform and then model mu instead of sigma?
This is related to this ongoing struggle.