GLM with a truncated Gamma distribution

Hi everyone!

I’m trying to build a linear model where I have a random variable that has a Gamma likelihood function, but the distribution is truncated by a lower value.
I found the code here which explains how to build a Gamma linear model that has a linear link function. However, I am unsure how to truncate that to a minimum value, since the pm.Truncated class cannot take a pm.CustomDist as its dist argument.

How would I extent the logic that follows (copied directly from the linked post) to include a truncation for my linear model?

NOTE: I do not wish to just truncate the noise to some lower value a, since once I add \mu to it, the resulting value can still be less than a (if \mu is negative)

import pymc as pm

def gamma_noise_regression(mu, alpha, beta, size=None):
    return mu + pm.Gamma.dist(alpha=alpha, beta=beta, size=size)

with pm.Model() as model:
    intercept = pm.Normal('intercept', sigma=10)
    slope = pm.Normal('slope', sigma=10)
    mu = intercept + slope * x
    
    alpha = pm.Exponential('alpha', 1)
    beta = pm.Exponential('beta', 1)
    
    y_obs = pm.CustomDist('y_obs', mu, alpha, beta, dist=gamma_noise_regression, observed=y)
    idata = pm.sample(init='jitter+adapt_diag_grad')

The gamma noise in this model is already “truncated” at zero, so it should enough to constrain mu to be positive. Depending on the values of x, you could switch the priors on intercept and slope to be strictly positive (HalfNormal?), or do some deterministic transformation of mu (Exp? Softplus?)

I could constrain mu, but then I might be “over-truncating” it if the unconstrained mu+Gamma are above my threshold of a

I’ve gotten some part of the way there, but I need some more help with the logp implementation. I’ve modified my code as follows:

def gamma_noise_regression(mu, alpha, beta, size=None):
    combined_value = mu + pm.Gamma.dist(alpha=alpha, beta=beta, size=size)
    alternate_value = pd.DiracDelta.dist(3.0, size=size)
    return pm.math.switch(combined_value >=3.0, combined_value, alternate_value)

def logp_gamma_noise_regression(value, mu, alpha, beta):
    combined_value_logp = pm.logp(mu + pm.Gamma.dist(alpha=alpha, beta=beta), value)
    alternate_value_logp = pm.logp(pd.DiracDelta.dist(3.0), value)
    return pm.math.switch(value >=3.0, combined_value_logp , alternate_value_logp )

I want to use my new logp function as part of the CustomDist call, but I feel the logp is not correctly implemented for a truncated random variable since there should be a bit more mass right of the truncation value (in this case a=3.0)

You can use a pm.Potential term if you just want to the sampler to reject samples below a certain threshold, something like pm.Potential('threshold_penalty', pm.math.switch(y_hat < a, -np.inf, 0).

Potentials are added onto the model’s logp, so this would give a log probability of -inf to any samples of y_hat less than your threshold.

I don’t recommend this, though, because it introduces a hard boundary into the geometry of your logp function. For complex models, this will give NUTS a hard time.

I’m not sure I understand your point about over truncating. If mu is bounded from below at a, and the noise is strictly positive, the quantity mu + noise is on the interval [a, \infty]. What’s the issue?

When I use the following code (my priors and all are a bit different so the numbers are bigger)

def gamma_noise_regression(mu, alpha, beta, size=None):
    return mu + pm.Gamma.dist(alpha=alpha, beta=beta, size=size)

The I get the following distribution of values when making predictions on new data:
amounts_no_truncation
The vertical line passes at 4020, which is where my truncation point is (values below this threshold are infeasible)

However, when I use the following code (capping mu at 4020):

def gamma_noise_regression(mu, alpha, beta, size=None):
    return pm.math.maximum(mu, 4020) + pm.Gamma.dist(alpha=alpha, beta=beta, size=size)

I get the following distribution of predicted values (along with about 2% divergences):
amounts_yes_truncation
As you can see, the minimum of the values is quite far from the truncation point. What I want is to get the same distribution as in the first figure, but cut off at the truncation point (4020).

Also, when I use the following code:

def gamma_noise_regression(mu, alpha, beta, size=None):
    return 4020 + pm.math.log1pexp(mu) + pm.Gamma.dist(alpha=alpha, beta=beta, size=size)

to avoid a hard boundary, I get sampling errors.

Here is my full code

def gamma_noise_regression(mu, alpha, beta, size=None):
    return 4020 + pm.math.log1pexp(mu) + pm.Gamma.dist(alpha=alpha, beta=beta, size=size)


with pm.Model() as model:  
    # Define priors

    const = pm.Normal("const",
                       mu=7_000,
                       sigma=1_000)

    coef = pm.Uniform("coef", lower=-1, upper=0.0) 

    mean = X * coef + const

    gamma_alpha = pm.Gamma("alpha", alpha=200.0, beta=10.0)
    gamma_beta = pm.Gamma("beta", alpha=100.0, beta=10.0)
   

    # Define likelihood
    
    likelihood = pm.CustomDist('y', mean, gamma_alpha, gamma_beta, 
                               dist=gamma_noise_regression, 
                               observed=y)

    # Inference!
    # draw posterior samples using NUTS sampling
    chain_number = 2
    idata = pm.sample(SAMPLE_NUMBER//chain_number, 
                      chains=chain_number, 
                      tune=5_000, 
                      )
# SamplingError: Initial evaluation of model at starting point failed!
# Starting values:
# {'const': array(7000.75080922), 'coef_interval__': array(-0.98534689), 'alpha_log__': array(3.41875223), 'beta_log__': array(2.06907241)}

# Logp initial evaluation results:
# {'const': -7.83, 'coef': -1.62, 'alpha': -18.98, 'beta': -1.14, 'y': -inf}

EDIT: I don’t have the proper vocabulary to explain this, but the issue with the results in the second figure is that in a case where mu is sampled to be 1,000 and the Gamma distribution returns a sample of 4,000, the sum mu+gamma is feasible under my model, but capping mu at a value of 4,020 makes that sum infeasible (since mu is smaller than 4,020 in this example). It is the sum that needs to be capped. That is why I feel implementing my own logp for this sum is my best bet, but I don’t know how to do that

Understood. If you want to do a truncated custom logp, there is a general formula for the PDF, \frac{g(x)}{F(b) - F(a)}, where g(x) is the PDF of the un-truncated distribution if x \in [a, b] and zero otherwise, and F(x) is the CDF.

The shifted gamma mu + gamma(a, b) has known PDF and CDF of f(x - \mu; \alpha, \beta, \mu) – that is, you just shift the input by \mu and then compute the usual gamma PDF/CDF. You can play around with scipy to confirm this.

Since scipy also gives you the inverse CDF for a loc-scale gamma, you can also implement the random function using inverse transform sampling. That will let you do prior/posterior predictive checks.

1 Like

This sounds like what I wanted so I’ll give it a try, thank you!

So I am at an impasse again; here is the implementation I have so far

def gamma_noise_regression(mu, alpha, beta, size=None):
    combined_value = mu + pm.Gamma.dist(alpha=alpha, beta=beta, size=size)
    alternate_value = pd.DiracDelta.dist(4020.0, size=size)
    return pm.math.switch(combined_value >=4020.0, combined_value, alternate_value)


def logp_gamma_noise_regression(value, mu, alpha, beta):
    total_result = pm.Gamma.dist(alpha=alpha, beta=beta)

    value1 = pm.math.exp(pm.logp(total_result, value - mu))


    cdf_trunc = pm.math.exp(pm.logcdf(total_result, 4020 - mu))
    value2 = 1 - cdf_trunc

    logp_result = pm.math.log(value1/value2)

    return pm.math.switch(value >=4020.0, 
                          logp_result, pm.logp(pd.DiracDelta.dist(4020.0), value))

However, when running my code I get the error:

Logp initial evaluation results:
{'const': -7.83, 'coef': -0.73, 'alpha': 0.57, 'beta': 0.22, 'y': nan}

Is there something off in my computation of the log pdf? Since I’m only truncating from the left, the formula should be f(x | X > a) = \frac{g(x)}{1 - F(a)}

One helpful thing for debugging is to insert pytenor Print Ops into your code so you can see what everything is evaluating to. Something like this:

def logp_gamma_noise_regression(value, mu, alpha, beta):
    total_result = pm.Gamma.dist(alpha=alpha, beta=beta)

    value1 = pm.math.exp(pm.logp(total_result, value - mu))
    value1 = pytensor.printing.Print('value1')(value1)


    cdf_trunc = pm.math.exp(pm.logcdf(total_result, 4020 - mu))
    cdf_trunc = pytensor.printing.Print('cdf_trunc')(cdf_trunc)
    value2 = 1 - cdf_trunc

    logp_result = pm.math.log(value1/value2)
    logp_result = pytensor.printing.Print('logp_result')(logp_result)

    return pm.math.switch(value >=4020.0, 
                          logp_result, pm.logp(pd.DiracDelta.dist(4020.0), value))

This will spam out the values of these variables each time they are used in a computation. My suspicion is that given the start values of alpha and beta, the mass of the total_result distribution is well below 4020, and the CDF of 4020 is 1. This would make the denominator of lop_result 0 and you get the nan. Looking at the print outputs will confirm it, though.

If I’m right, there is a nice utility called pm.find_constrained_prior, which will use an optimizer to find parameters such that 95% of a distribution’s mass is inside a requested range. You could use to make sure your alpha and beta priors don’t put the truncation region too far out into the tail of the distribution.

2 Likes

Yes you were right. I had to play around with my priors and starting values a bit (using the initvals argument to the pm.sample function) to find a reasonable set of starting values. For some values I also got a value of -inf because my Gamma distribution did not have a long enough tail to accommodate large values.
I did not get a chance to use find_constrained_prior but I will give it a shot.