There must be something broken with what we did, and I’m thinking about numerical stability or something like that. Here I’m doing what @ricardoV94 suggests and everything works well.
import numpy as np
import pymc as pm
import arviz as az
import pandas as pd
def simulate_data(n_observations):
x = np.random.normal(size=n_observations)
mu = np.exp(1.5 + 0.5 * x)
sigma = 1
psi = 0.8
dist = pm.HurdleGamma.dist(mu=mu, sigma=sigma, psi=psi)
y = pm.draw(dist, random_seed=1111)
return pd.DataFrame(dict(x=x, y=y))
df = simulate_data(180)
x = df.x.values
y = df.y.values
x_non_zero = x[y > 0]
y_non_zero = y[y > 0]
y_bernoulli = (y == 0) * 1.0
with pm.Model() as model:
b0 = pm.Normal("b0", 0, 1)
b1 = pm.Normal("b1", 0, 1)
eta = b0 + b1 * x_non_zero
mu = pm.math.exp(eta)
sigma = pm.Exponential("sigma", 1)
psi = pm.Uniform('psi', 0, 1)
pm.Gamma("y_gamma", mu=mu, sigma=sigma, observed=y_non_zero)
pm.Bernoulli("y_bernoulli", p=1 - psi, observed=y_bernoulli)
model.to_graphviz()
with model:
idata = pm.sample(random_seed=1234)
az.plot_trace(idata, backend_kwargs={"layout": "constrained"});
Note that changing priors (even making them extremely tight around the true values) didn’t help at all. It only made the sampler slower. But still tons of divergences. I also tried to change the init vals, didn’t help either.
We should investigate this further.


