HurdleGamma diverges, where as a Mixture samples perfectly well

dpananos · December 27, 2024, 12:22am

Consider the following data generating process from a Hurdle Gamma model.

import numpy as np
import pymc as pm
import arviz as az
import pandas as pd

def simulate_data(n_observations):
    x = np.random.normal(size=n_observations)
    mu = np.exp(1.5 + 0.5 * x)
    sigma = 1
    psi = 0.8
    dist = pm.HurdleGamma.dist(mu=mu, sigma=sigma, psi=psi)
    y = pm.draw(dist)

    return pd.DataFrame(dict(x=x, y=y))

df = simulate_data(180)

I’ve written the model below, but the model suffers from divergences. Nearly all draws are divergent, which is worrying considering the model is simple and the data generating process is correct. It appears like the chains are getting “stuck” (i.e. not moving away from their initialized values).

x = df.x.values
y = df.y.values 

with pm.Model() as model:
    X = pm.Data("X", df.x.values, dims="ix")
    Y = pm.Data("Y", y)


    b0 = pm.Normal("b0", 0, 1)
    b1 = pm.Normal("b1", 0, 1)
    eta = b0 + b1 * X
    mu = pm.math.exp(eta)

    sigma = pm.Exponential("sigma", 1)
    psi = pm.Uniform('psi', 0, 1)
    Yobs = pm.HurdleGamma('Yobs', mu=mu, sigma=sigma, observed=y, psi=psi)


with model:
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))

However, when I write the model as a mixture (below) the model samples fine

x = df.x.values
y = df.y.values 

with pm.Model() as model:
    X = pm.Data("X", df.x.values, dims="ix")
    Y = pm.Data("Y", y)


    b0 = pm.Normal("b0", 0, 1)
    b1 = pm.Normal("b1", 0, 1)
    eta = b0 + b1 * X
    mu = pm.math.exp(eta)

    sigma = pm.Exponential("sigma", 1)

    w = pm.Dirichlet('w', a = np.array([1, 1]))

    components = [
        pm.DiracDelta.dist(0.0), 
        pm.Gamma.dist(mu=mu, sigma=sigma)
    ]

    like = pm.Mixture("like", w=w, comp_dists=components, observed=Y)


with model:
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))

The mixture model is fairly similar to how I would write the model in Stan, and the one key difference I’ve found between this approach and HurdleGamma is that PyMC seems to divide the Gamma density by a function of the density evaluated at machine epsilon.

Seems strange that the latter model should sample well whereas the former should not. I tried replacing the gamma density in the model that samples well with a truncated gamma density (truncating at machine epsilon) with pm.Truncated.dist(pm.Gamma.dist(mu=mu, sigma=sigma), lower=np.finfo(pytensor.config.floatX).eps). This should mimic how HurdleGamma is implemented. The model started to diverge similarly to the first model.

I’m suspicious of the implementation of HurdleGamma as a result of this and I’m not quite sure why the machine epsilon part is being added. If we consider the model written using a latent bernoulli variable, z, with probability of success \psi, the log likelihood for the model is

\mathcal{L}=\begin{cases} \log(1-\psi) \quad, z=0\\ \log(\psi) + \log(\text{Gamma Density}) \quad , z=1\end{cases}

This log likelihood, as I understand, is what is implemented in my second model and in my Stan model (you an read about the stan model here, or see the hurdle gamma density code which brms outputs here).

Could someone answer the following for me:

Why is the machine epsilon part added in the HurdleGamma implementation?
Is there something in my first model that I can fix so that the model samples well (like my second)?

ricardoV94 · December 27, 2024, 8:27am

The mixture model is wrong, as it assumes the zeros could have come from either component which is not true. It’s mixing densities and masses.

That’s what the epsilon truncation tries to avoid, although it’s fundamentally just a hack to reuse the mixture implementation.

As to why your model fails to sample I’m not sure. Perhaps the priors are too diffuser or you don’t have enough data.

You can break the observations into a gamma likelihood for the non zeros and a binomial for the number of zeros which should be equivalent to the hurdle model, without the epsilon truncation hack. Does that also fail to sample?

tcapretto · December 27, 2024, 2:09pm

There must be something broken with what we did, and I’m thinking about numerical stability or something like that. Here I’m doing what @ricardoV94 suggests and everything works well.

import numpy as np
import pymc as pm
import arviz as az
import pandas as pd

def simulate_data(n_observations):
    x = np.random.normal(size=n_observations)
    mu = np.exp(1.5 + 0.5 * x)
    sigma = 1
    psi = 0.8
    dist = pm.HurdleGamma.dist(mu=mu, sigma=sigma, psi=psi)
    y = pm.draw(dist, random_seed=1111)

    return pd.DataFrame(dict(x=x, y=y))

df = simulate_data(180)

x = df.x.values
y = df.y.values

x_non_zero = x[y > 0]
y_non_zero = y[y > 0]
y_bernoulli = (y == 0) * 1.0

with pm.Model() as model:
    b0 = pm.Normal("b0", 0, 1)
    b1 = pm.Normal("b1", 0, 1)

    eta = b0 + b1 * x_non_zero
    mu = pm.math.exp(eta)

    sigma = pm.Exponential("sigma", 1)
    psi = pm.Uniform('psi', 0, 1)

    pm.Gamma("y_gamma", mu=mu, sigma=sigma, observed=y_non_zero)
    pm.Bernoulli("y_bernoulli", p=1 - psi, observed=y_bernoulli)

model.to_graphviz()

with model:
    idata = pm.sample(random_seed=1234)

az.plot_trace(idata, backend_kwargs={"layout": "constrained"});

Note that changing priors (even making them extremely tight around the true values) didn’t help at all. It only made the sampler slower. But still tons of divergences. I also tried to change the init vals, didn’t help either.

We should investigate this further.

dpananos · December 27, 2024, 4:23pm

@tcapretto @ricardoV94

Thanks for this. Breaking the observed data into the bernoulli and gamma parts is a fine workaround and aligns with approaches I’ve taken in stan before.

+1 to making the priors extremely tight around the true values. All this does is make the sampling longer, but the samples are still divergent.

Would you like me to open an issue on github?

cluhmann · December 27, 2024, 4:31pm

Definitely. Thanks!

dpananos · December 27, 2024, 4:36pm

Issue here

github.com/pymc-devs/pymc

BUG: <Please write a comprehensive title after the 'BUG: ' prefix>

opened 04:36PM - 27 Dec 24 UTC

Dpananos

bug

### Describe the issue: A simple `HurdleGamma` experiences a very high number o…f divergences, even when priors are tightly centered around true values and the data generating process is correct. Some chains get "stuck" -- they do not move from their initialized values. [For more, please see this thread in the PyMC community forums](https://discourse.pymc.io/t/hurdlegamma-diverges-where-as-a-mixture-samples-perfectly-well/16300) ![image](https://github.com/user-attachments/assets/86f4e8d6-dae6-4cf5-8d5b-49b97e9ec3fb) ### Reproduceable code example: ```python x = df.x.values y = df.y.values with pm.Model() as model: X = pm.Data("X", df.x.values, dims="ix") Y = pm.Data("Y", y) b0 = pm.Normal("b0", 0, 1) b1 = pm.Normal("b1", 0, 1) eta = b0 + b1 * X mu = pm.math.exp(eta) sigma = pm.Exponential("sigma", 1) psi = pm.Uniform('psi', 0, 1) Yobs = pm.HurdleGamma('Yobs', mu=mu, sigma=sigma, observed=y, psi=psi) with model: idata = pm.sample() idata.extend(pm.sample_posterior_predictive(idata)) ``` ### Error message: _No response_ ### PyMC version information: 5.19.1 ### Context for the issue: _No response_

Topic		Replies	Views
Simple generative model, but divergent Questions	0	481	November 4, 2020
Sampling options with many observations and a HurdleGamma likelihood v5	14	94	April 21, 2025
Error KeyError: 'diverging' when setting up CategoricalGibbsMetropolis step method with proposal='proportional' v5 sampling	4	21	December 24, 2024
Reducing divergences : Implementation challenges Questions	7	2035	August 3, 2021
PyMC sampler does not converge Questions	3	1234	January 8, 2022

HurdleGamma diverges, where as a Mixture samples perfectly well

Related topics