Models with low observed uncertainty compared to rv uncertainty

I have a problem where I am trying to model the output of some complicated process using surrogate models (I’ve asked several questions in this regard). My surrogate model is a gaussian process fit to data generated by my expensive model. The GP is parameterized such that the parameters of my expensive model are the inputs and measurements of interest are the outputs.

The measurements of my model are known with a high degree of certainty compared to the uncertainties on my parameters so that if my model variation in output due to my parameters is order 1, the variation in my measured test data is order 1e-3 - 1e-4. I tried defining my likelihood like this:

unknown_sigma = pm.HalfNormal('unk_sigma',sd=some_value)
sigma = pm.Deterministic('sigma',unknown_sigma + observed_sigma)
y = pm.Normal('y',mu=surrogate_out,sd = sigma, observed=data)

but I have a lot of problems with divergence. Does anyone have any ideas on how to reparameterize this model so that I could avoid this problem?

Difficult to say without the data and the model, but since:

did you try scaling the measured data?

For example, puting it into a range of 0 - 1 or 0 - 10 would be easier to handle numerically.

I didn’t try scaling the data yet though I can give that a shot. I still will have a much larger variance in the output than the input.

Is there a way to start with a higher tolerance and, either in sequential samplings or during a single trace, decrease it once the parameters have been better estimated?

Not sure I understand: tolerance in terms of what? scaling?

Sorry, that is what I mean. What I was thinking was is it possible to do something like:

with pm.Model() as model:
    define my model here...

    y = pm.Normal('y',mu=surrogate_out, sd=factor*sigma, observed=data)

where factor can start as a large number and be gradually decreased as the sampling converges?

I dont think that is a valid thing to do. First, you need to know what sampling converges you are looking for (i.e., how do you define convergence); and second, I think it is more important to investigate why a model does not converge (or output divergence samples), an automatic process doesnt seems to be the answer here.

That makes sense.

What I think is happening, is that because my observed variance is so small the normal distribution I’m using for y just can’t find the observed value. My thought was to artificially inflate my observed sigma using the factor to some value that I could converge at and use the resulting median of my trace as an initial guess as I reduced the value of the factor.