Consider the following data generating process from a Hurdle Gamma model.
import numpy as np
import pymc as pm
import arviz as az
import pandas as pd
def simulate_data(n_observations):
x = np.random.normal(size=n_observations)
mu = np.exp(1.5 + 0.5 * x)
sigma = 1
psi = 0.8
dist = pm.HurdleGamma.dist(mu=mu, sigma=sigma, psi=psi)
y = pm.draw(dist)
return pd.DataFrame(dict(x=x, y=y))
df = simulate_data(180)
I’ve written the model below, but the model suffers from divergences. Nearly all draws are divergent, which is worrying considering the model is simple and the data generating process is correct. It appears like the chains are getting “stuck” (i.e. not moving away from their initialized values).
x = df.x.values
y = df.y.values
with pm.Model() as model:
X = pm.Data("X", df.x.values, dims="ix")
Y = pm.Data("Y", y)
b0 = pm.Normal("b0", 0, 1)
b1 = pm.Normal("b1", 0, 1)
eta = b0 + b1 * X
mu = pm.math.exp(eta)
sigma = pm.Exponential("sigma", 1)
psi = pm.Uniform('psi', 0, 1)
Yobs = pm.HurdleGamma('Yobs', mu=mu, sigma=sigma, observed=y, psi=psi)
with model:
idata = pm.sample()
idata.extend(pm.sample_posterior_predictive(idata))
However, when I write the model as a mixture (below) the model samples fine
x = df.x.values
y = df.y.values
with pm.Model() as model:
X = pm.Data("X", df.x.values, dims="ix")
Y = pm.Data("Y", y)
b0 = pm.Normal("b0", 0, 1)
b1 = pm.Normal("b1", 0, 1)
eta = b0 + b1 * X
mu = pm.math.exp(eta)
sigma = pm.Exponential("sigma", 1)
w = pm.Dirichlet('w', a = np.array([1, 1]))
components = [
pm.DiracDelta.dist(0.0),
pm.Gamma.dist(mu=mu, sigma=sigma)
]
like = pm.Mixture("like", w=w, comp_dists=components, observed=Y)
with model:
idata = pm.sample()
idata.extend(pm.sample_posterior_predictive(idata))
The mixture model is fairly similar to how I would write the model in Stan, and the one key difference I’ve found between this approach and HurdleGamma
is that PyMC seems to divide the Gamma density by a function of the density evaluated at machine epsilon.
Seems strange that the latter model should sample well whereas the former should not. I tried replacing the gamma density in the model that samples well with a truncated gamma density (truncating at machine epsilon) with pm.Truncated.dist(pm.Gamma.dist(mu=mu, sigma=sigma), lower=np.finfo(pytensor.config.floatX).eps)
. This should mimic how HurdleGamma
is implemented. The model started to diverge similarly to the first model.
I’m suspicious of the implementation of HurdleGamma
as a result of this and I’m not quite sure why the machine epsilon part is being added. If we consider the model written using a latent bernoulli variable, z, with probability of success \psi, the log likelihood for the model is
\mathcal{L}=\begin{cases} \log(1-\psi) \quad, z=0\\ \log(\psi) + \log(\text{Gamma Density}) \quad , z=1\end{cases}
This log likelihood, as I understand, is what is implemented in my second model and in my Stan model (you an read about the stan model here, or see the hurdle gamma density code which brms outputs here).
Could someone answer the following for me:
- Why is the machine epsilon part added in the
HurdleGamma
implementation? - Is there something in my first model that I can fix so that the model samples well (like my second)?