# The zero-inflated exponential

Hello !

I’m trying to model a process using a zero-inflated exponential. I’ve written up a log-probability function for it, and giving it a test-drive in theano, I believe it gives me what I’m expecting. However, sampling with PyMC3 returns strange results – the zero-inflation rate always comes out at zero.

``````# Fix parameters
lam = 3
p = 0.2

# Generate some zero-inflated exponential random numbers
data = np.random.exponential(1/lam, size=100000)
data[np.random.rand(len(data)) < p] = 0

print("Number of zeros : {}".format((data == 0).sum()))
print("Mean : {:.2f}".format(data.mean()))

# Define the log-likelihood of a zero-inflated exponential
def zero_inflated_exponential(lam, p, x):
"""
Returns the log-probability of an exponential distribution,
with a fixed probability of seeing a zero instead.
"""

return tt.switch(
tt.eq(x, 0.),
tt.log(p + lam - (lam * p)),
tt.log(lam - (lam * p)) - (lam * x)
)

with pm.Model() as m:

# Priors on lambda and zero rate
lam_ = pm.HalfNormal("lam", sd=2)
p_ = pm.Uniform("p", lower=0, upper=1, testval=0.1)

# Our custom log-prob function
observed = pm.DensityDist(
"observed",
zero_inflated_exponential,
observed={"lam": lam_, "p": p_, "x": data}
)

trace = pm.sample(1000, chains=4, tune=500)

pm.traceplot(trace, combined=True);
``````

Here’s the result.

Any help much appreciated in figuring out why this is happening. Note that when `p = 0`, the MCMC correctly infers the value of `lam`. What am I missing ?

Thanks so much.

I would imagine that this is pretty difficult to estimate as exponential distribution has large likelihood around 0 to begin with. I would try with much stronger prior for `p`.

Thank you for the insight ! I tried as you suggested. For a setup with `lam = 3` and `p = 0.3`, I tried using a Beta prior on `p` with some fairly extreme ( I believe ) prior knowledge baked in – I set `alpha = 24` and `beta = 56`, such that the mean of the prior on `p` was at exactly 0.3. That looks like this :

Unfortunately, this still doesn’t work :

Any other thoughts very much appreciated.

So perhaps this is it… I’ve calculated the likelihood of this data for all values of `p`, and that looks like this.

Seems like my likelihood function is probably wrong…

The likelihood was indeed wrong. The second term in the return of the `zero_inflated_exponential` function should be `tt.log(p)`, not `tt.log(p + lam - (lam * p))`.

Glad you found the solution