The zero-inflated exponential

Hello !

I’m trying to model a process using a zero-inflated exponential. I’ve written up a log-probability function for it, and giving it a test-drive in theano, I believe it gives me what I’m expecting. However, sampling with PyMC3 returns strange results – the zero-inflation rate always comes out at zero.

# Fix parameters
lam = 3
p = 0.2

# Generate some zero-inflated exponential random numbers
data = np.random.exponential(1/lam, size=100000)
data[np.random.rand(len(data)) < p] = 0

print("Number of zeros : {}".format((data == 0).sum()))
print("Mean : {:.2f}".format(data.mean()))

# Define the log-likelihood of a zero-inflated exponential
def zero_inflated_exponential(lam, p, x):
    Returns the log-probability of an exponential distribution,
    with a fixed probability of seeing a zero instead.

    return tt.switch(
        tt.eq(x, 0.),
        tt.log(p + lam - (lam * p)),
        tt.log(lam - (lam * p)) - (lam * x)

with pm.Model() as m:

    # Priors on lambda and zero rate
    lam_ = pm.HalfNormal("lam", sd=2)
    p_ = pm.Uniform("p", lower=0, upper=1, testval=0.1)

    # Our custom log-prob function
    observed = pm.DensityDist(
        observed={"lam": lam_, "p": p_, "x": data}

    trace = pm.sample(1000, chains=4, tune=500)

pm.traceplot(trace, combined=True);

Here’s the result.


Any help much appreciated in figuring out why this is happening. Note that when p = 0, the MCMC correctly infers the value of lam. What am I missing ?

Thanks so much.

I would imagine that this is pretty difficult to estimate as exponential distribution has large likelihood around 0 to begin with. I would try with much stronger prior for p.

Thank you for the insight ! I tried as you suggested. For a setup with lam = 3 and p = 0.3, I tried using a Beta prior on p with some fairly extreme ( I believe ) prior knowledge baked in – I set alpha = 24 and beta = 56, such that the mean of the prior on p was at exactly 0.3. That looks like this :

Unfortunately, this still doesn’t work :

Any other thoughts very much appreciated.

So perhaps this is it… I’ve calculated the likelihood of this data for all values of p, and that looks like this.

Seems like my likelihood function is probably wrong…

The likelihood was indeed wrong. The second term in the return of the zero_inflated_exponential function should be tt.log(p), not tt.log(p + lam - (lam * p)).

Glad you found the solution :sweat_smile: