HurdleGamma diverges, where as a Mixture samples perfectly well

Consider the following data generating process from a Hurdle Gamma model.

import numpy as np
import pymc as pm
import arviz as az
import pandas as pd

def simulate_data(n_observations):
    x = np.random.normal(size=n_observations)
    mu = np.exp(1.5 + 0.5 * x)
    sigma = 1
    psi = 0.8
    dist = pm.HurdleGamma.dist(mu=mu, sigma=sigma, psi=psi)
    y = pm.draw(dist)

    return pd.DataFrame(dict(x=x, y=y))

df = simulate_data(180)

I’ve written the model below, but the model suffers from divergences. Nearly all draws are divergent, which is worrying considering the model is simple and the data generating process is correct. It appears like the chains are getting “stuck” (i.e. not moving away from their initialized values).

x = df.x.values
y = df.y.values 

with pm.Model() as model:
    X = pm.Data("X", df.x.values, dims="ix")
    Y = pm.Data("Y", y)


    b0 = pm.Normal("b0", 0, 1)
    b1 = pm.Normal("b1", 0, 1)
    eta = b0 + b1 * X
    mu = pm.math.exp(eta)

    sigma = pm.Exponential("sigma", 1)
    psi = pm.Uniform('psi', 0, 1)
    Yobs = pm.HurdleGamma('Yobs', mu=mu, sigma=sigma, observed=y, psi=psi)


with model:
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))

However, when I write the model as a mixture (below) the model samples fine

x = df.x.values
y = df.y.values 

with pm.Model() as model:
    X = pm.Data("X", df.x.values, dims="ix")
    Y = pm.Data("Y", y)


    b0 = pm.Normal("b0", 0, 1)
    b1 = pm.Normal("b1", 0, 1)
    eta = b0 + b1 * X
    mu = pm.math.exp(eta)

    sigma = pm.Exponential("sigma", 1)

    w = pm.Dirichlet('w', a = np.array([1, 1]))

    components = [
        pm.DiracDelta.dist(0.0), 
        pm.Gamma.dist(mu=mu, sigma=sigma)
    ]

    like = pm.Mixture("like", w=w, comp_dists=components, observed=Y)


with model:
    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))

The mixture model is fairly similar to how I would write the model in Stan, and the one key difference I’ve found between this approach and HurdleGamma is that PyMC seems to divide the Gamma density by a function of the density evaluated at machine epsilon.

Seems strange that the latter model should sample well whereas the former should not. I tried replacing the gamma density in the model that samples well with a truncated gamma density (truncating at machine epsilon) with pm.Truncated.dist(pm.Gamma.dist(mu=mu, sigma=sigma), lower=np.finfo(pytensor.config.floatX).eps). This should mimic how HurdleGamma is implemented. The model started to diverge similarly to the first model.

I’m suspicious of the implementation of HurdleGamma as a result of this and I’m not quite sure why the machine epsilon part is being added. If we consider the model written using a latent bernoulli variable, z, with probability of success \psi, the log likelihood for the model is

\mathcal{L}=\begin{cases} \log(1-\psi) \quad, z=0\\ \log(\psi) + \log(\text{Gamma Density}) \quad , z=1\end{cases}

This log likelihood, as I understand, is what is implemented in my second model and in my Stan model (you an read about the stan model here, or see the hurdle gamma density code which brms outputs here).

Could someone answer the following for me:

  • Why is the machine epsilon part added in the HurdleGamma implementation?
  • Is there something in my first model that I can fix so that the model samples well (like my second)?

The mixture model is wrong, as it assumes the zeros could have come from either component which is not true. It’s mixing densities and masses.

That’s what the epsilon truncation tries to avoid, although it’s fundamentally just a hack to reuse the mixture implementation.

As to why your model fails to sample I’m not sure. Perhaps the priors are too diffuser or you don’t have enough data.

You can break the observations into a gamma likelihood for the non zeros and a binomial for the number of zeros which should be equivalent to the hurdle model, without the epsilon truncation hack. Does that also fail to sample?

There must be something broken with what we did, and I’m thinking about numerical stability or something like that. Here I’m doing what @ricardoV94 suggests and everything works well.

import numpy as np
import pymc as pm
import arviz as az
import pandas as pd

def simulate_data(n_observations):
    x = np.random.normal(size=n_observations)
    mu = np.exp(1.5 + 0.5 * x)
    sigma = 1
    psi = 0.8
    dist = pm.HurdleGamma.dist(mu=mu, sigma=sigma, psi=psi)
    y = pm.draw(dist, random_seed=1111)

    return pd.DataFrame(dict(x=x, y=y))

df = simulate_data(180)

x = df.x.values
y = df.y.values

x_non_zero = x[y > 0]
y_non_zero = y[y > 0]
y_bernoulli = (y == 0) * 1.0

with pm.Model() as model:
    b0 = pm.Normal("b0", 0, 1)
    b1 = pm.Normal("b1", 0, 1)

    eta = b0 + b1 * x_non_zero
    mu = pm.math.exp(eta)

    sigma = pm.Exponential("sigma", 1)
    psi = pm.Uniform('psi', 0, 1)

    pm.Gamma("y_gamma", mu=mu, sigma=sigma, observed=y_non_zero)
    pm.Bernoulli("y_bernoulli", p=1 - psi, observed=y_bernoulli)

model.to_graphviz()

with model:
    idata = pm.sample(random_seed=1234)

az.plot_trace(idata, backend_kwargs={"layout": "constrained"});


Note that changing priors (even making them extremely tight around the true values) didn’t help at all. It only made the sampler slower. But still tons of divergences. I also tried to change the init vals, didn’t help either.

We should investigate this further.

@tcapretto @ricardoV94

Thanks for this. Breaking the observed data into the bernoulli and gamma parts is a fine workaround and aligns with approaches I’ve taken in stan before.

+1 to making the priors extremely tight around the true values. All this does is make the sampling longer, but the samples are still divergent.

Would you like me to open an issue on github?

1 Like

Definitely. Thanks!

1 Like

Issue here

2 Likes