How to prevent a domain error?

Hyung · June 30, 2022, 3:06am

Hi,

I am trying to fit my simple linear model with the Gamma distribution likelihood function which requires mu ranges to be 0 to 1.

My simple code is:

def fn(a,b,x):
     y_hat = a + b*x
     return y_hat

with pm.Model() as model:

      a = pm.HalfNormal('a', 10)
      b = pm.HafNormal('b', 10)
      mu = fn(a,b,x)
      pm.Gamma('obs', mu = mu, sigma = sigma, observed=y)
      pm.sample(4000, tune=2000, random_seed=rng, chains=2, target_accept=0.99 , init="advi")

With this model setting, if my mu, the fn(a,b,x) produces larger than 1 or smaller than 0, it fails to sample.

Could you please advise how to prevent this possible domain error?
Is there any method that I can penalize loss whenever fn produces a larger than 1 and smaller than 0 value? Something like:

pm.Potential("negative_penalty", pm.math.switch(mu < 0, -np.inf, 0))

Do you have some tips to build a model which requires specific domain ranges for likelihood functions?

Thanks!

jessegrabowski · June 30, 2022, 5:30am

The pm.Potential approach you propose works, but it will introduce discontinuities into the geometry of the likelihood function. This makes it harder on the NUTS sampler. So I would use that as a last resort.

If you know you want mu to be between 0 and 1, you could just apply the sigmoid (inverse logit) function to the output of fn, like:

pm.Gamma('obs', mu=pm.invlogit(mu), sigma=sigma, obesrved=y)

As a rule of thumb, it’s better to let the sampler propose unconstrained values, then transform those values to meet your constraints, rather than trying to build in hard constraints.

Hyung · June 30, 2022, 5:47am

I see. for pm.invlogit(mu), do I also need to transform y with the sigmoid function?

My question is, if we do not transform observed y with sigmoid, how can we make an inference with our parameters?

For example,

case 1)
with pm.Normal(‘obs’, mu=ax, 1, observed=y), it is easy to make an inference with estimated a. If a is around 0.5, we can say that y is always a half much as x.

case 2)
On the other hand, if we make, pm.Normal(‘obs’, mu=pm.invlogit(ax), 1, observed=y), it may be impossible to make an inference like case 2 (am I wrong?)

case 3)
If we do pm.Normal(‘obs’, mu=pm.invlogit(ax),1, observed=sigmoid_y), we may be able to make the same inference as case 1).

Am I misunderstanding the process?

To make a better inference and fit the model with a better likelihood function, do I need to pursue the case 3)?

jessegrabowski · June 30, 2022, 6:58am

In all cases, you can answer these questions by taking partial derivatives of the expected value of \hat{y_i} given your model.

In the normal case, the EV of \hat{y_i} is just \mu, so if you have a linear function for \mu, then \frac{\partial \hat y_i}{\partial x_i} = a, so a unit change in x_i corresponds to a change of size a to your prediction for the i-th object in your data.

If you apply a sigmoid, you end up with \mathbb E[\hat y_i] = \frac{1}{1+e^{-ax_i}}, so the partial derivative w.r.t x_i will end up depending on y_i: \frac{\partial \hat y_i}{\partial x_i} = \frac{e^{-ax_i}}{(1 + e^{-ax_i})^2} = a \hat y_i \frac{1}{1 + e^{ax_i}}

Here it’s not impossible to interpret the output, but it is difficult, because 1) the size of the effect of a change in x_i depends on the value of \hat{y}_i, and 2) the effect of a change in x_i is non-linear. So it’s not so neat and tidy, but people do it. You could evaluate it at meaningful points (mean?, median?, mode?), or for values of x_i corresponding to particular objects of interest in your data set.

Note that transforming the data has nothing to do with any of this. Once you leave linear relationships, interpreting coefficients gets more difficult and there’s not a lot you can do about it.

That being said, you might explore different transformations of your data to make them more normal and avoid these issues all together.

Hyung · June 30, 2022, 1:21pm

Thanks, Jesse. Yes, I agreed that the trade-off between transformation and interpretability is common. But as you are aware, in many cases, transforming data into the Normal dist. is almost impossible.

A follow-up question arose with the case that we cannot transform our observed data into the Normal dist. For example, if we use different likelihood functions, such as Beta distribution for the likelihood function, the mean would be \alpha\over(\alpha + \beta). In this case, would the EV of \hat{y}_i be the \alpha\over(\alpha + \beta)?
In addition, with the Beta distribution, (without any data transformation), the partial derivative w.r.t. x_i is still {\partial \hat{y}_i \over \partial x_i} =a, so in Beta distribution likelihood case, how can we interpret a? Can we still assert that “a unit change in x_i corresponds to a change of size a to my prediction for the i-th object in my data” with the Beta likelihood distribution?

jessegrabowski · June 30, 2022, 1:29pm

If your likelihood is beta distributed, you are in the so-called Beta-regression case from the GLM literature. You could look at, for example, research like this for some insight on how to interpret the parameters.

Hyung · June 30, 2022, 1:52pm

Thank you for sharing the article. Yes, it is much more complicated then what I expected. Thank you so much!

Topic		Replies	Views
Domain Error in Arguments for Gamma MLM Questions	0	932	October 22, 2019
How can I penalize if the model sample negative values? Questions	6	737	June 21, 2022
Why doesn't pm.Potential prevent SamplingError from invalid Gamma parameters? v5 modeling	1	18	May 17, 2025
GLM with a truncated Gamma distribution v5 linear_model , modeling	9	737	September 22, 2023
How to fix "Domain error" when using posterior predictive sampling? Questions	4	4721	December 27, 2018

How to prevent a domain error?

Related topics