Sampling from a learned mixture of zeros and lognormal

I’m trying to sample from a learned mixture of zeros and lognormal values. So far, none of my solutions have worked. I’ve generated a synthetic set of data for testing, and have all the code to work with it available in this gist of a notebook.

To summarize my issues so far: Mixtures of lognormal throw “Bad initial energy” errors if the data contains non-positive values, even if those values are supported by another distribution. That’s easy enough to fix in my case: I just add a small value to the data. The problem is that this still throws “bad initial energy” errors, just later in the sampling process.
with pm.Model() as ln_model_test:

mu_ln = pm.Normal("mu_ln", mu=LOGMEAN, sd=LOGSTD)
sd_ln = pm.InverseGamma("sd_ln", mu=LOGSTD, sd=LOGSTD)
nonzero = pm.Lognormal.dist(mu=mu_ln, sd=sd_ln)

zero = pm.Constant.dist(DELTA)
w = pm.Dirichlet('w', a=np.array([1,1]))
log_obs = pm.Mixture("log_obs", w=w, comp_dists=[zero, nonzero], observed=samples+DELTA)
log_dist = pm.Mixture("log_dist", w=w, comp_dists=[zero, nonzero])


ln_trace = pm.sample(1000, tune=5000, cores=7)

So I’ve abandoned lognormal, and just take the log of the data:
with pm.Model() as model_test:

mu = pm.Normal("mu", mu=LOGMEAN, sd=LOGSTD/len(samples_log))
sd = pm.InverseGamma("sd", mu=LOGSTD, sd=LOGSTD)
nonzero = pm.Normal.dist(mu=mu, sd=sd)

zero = pm.Constant.dist(np.log(0+DELTA))
w = pm.Dirichlet('w', a=np.array([1,1]))
obs = pm.Mixture("obs", w=w, comp_dists=[zero, nonzero], observed=samples_log)
output = pm.Mixture("output", w=w, comp_dists=[zero, nonzero])
dist = pm.Deterministic("dist", np.exp(output) - DELTA)

trace = pm.sample(1000, tune=5000, cores=4)

This works well enough to learn the mixture and lognormal parameters, but when we sample from dist, it only samples from the normal distribution:
image
(peach original, blue posterior samples).

So it seems like there are two issues:

  1. Lognormal is not behaving as I expect it to
  2. Sampling from mixture distributions does not work as I expect.

Any advice on this?

Probable cause for the mixture sampling issue: The support of a mixture is the intersection of the supports for its distributions. I may try assigning adding a small random amount to each datum instead of a fixed delta, and model it as a mixture of normals.

Why not use a zero-inflated likelihood, and sample from pm.sample_posterior_predictive?

class ZeroInflatedLogNormal(pm.Lognormal):
    EPSILON=2.0**(-10)
    def __init__(self, alpha=0.01, mu=0, sigma=None, tau=None, sd=None, *args, **kwargs):
        super().__init__(mu, sigma, tau, sd, *args, **kwargs)
        self.alpha = alpha
        self.alpha_log = tt.log(alpha)
        self.oma_log = tt.log(1-alpha)
        
    def random(self, point=None, size=None):
        raw = super().random(point, size)
        Q = pm.Bernoulli.dist(1-self.alpha).random(size=raw.shape)
        return raw * Q
    
    def logp(self, value):
        return tt.switch(tt.lt(value, self.EPSILON), self.alpha_log, self.oma_log + super().logp(value))
    

with pm.Model() as mod:
    alpha = pm.Beta('inflation', 1, 10)
    mu = pm.Normal('mu', 0, 1)
    sd = pm.HalfNormal('sd', 1)
    ziln = ZeroInflatedLogNormal('X', alpha, mu, sd, observed=samples)
    tr = pm.sample(500, chains=6, cores=2, tune=750)
    tr2 = pm.sample(500, chains=1, cores=1, tune=750) # for post
    post = pm.sample_posterior_predictive(tr2)

from matplotlib import pyplot as plt
pm.traceplot(tr);
plt.figure()
sh = np.prod(post['X'].shape)
plt.hist(np.log(DELTA + post['X'].reshape((sh,))), 30);

image

6 Likes

I’m curious: Why do we need to use sample_posterior_predictive? I tried to create another ZeroInflatedLogNormal variable with the same parameters as ziln, but 0 never shows up in the trace.

You shouldn’t even have ziln (or X) in the trace, since it is given an observed=. sample_posterior_predictive uses the parameters (without observed=) in the trace to sample from the data-generating distribution (parameters with observed=)

Yes, I was sampling from a new variable. Below is the model I was using - log_dist displays the same sampling bias that dist does in the original post.

with pm.Model() as ln_model_test:
    
    mu_ln = pm.Normal("mu_ln", mu=LOGMEAN, sigma=LOGSTD)
    sd_ln = pm.InverseGamma("sd_ln", mu=LOGSTD, sigma=LOGSTD)
    alpha = pm.Beta('alpha', 1, 10)
    
    log_obs = ZeroInflatedLogNormal("log_obs", alpha=alpha, mu=mu_ln, sigma=sd_ln, observed=samples)
    log_dist = ZeroInflatedLogNormal("log_dist", alpha=alpha, mu=mu_ln, sigma=sd_ln)
    
    ln_trace = pm.sample(1000, tune=5000, cores=7)
sns.distplot(np.log(ln_trace["log_dist"]+DELTA))
sns.distplot(np.log(samples+DELTA))

image

The problem with sampling from something like ZeroInflatedLogNormal when it is not an observed, is that it is a mixture of a discrete and a continuous variable. As such it is neither continuous nor discrete. None of the samplers in pymc can deal with that unfortunately. The pm.Mixture should probably throw an error if it is created as a free variable. This is now https://github.com/pymc-devs/pymc3/issues/3582.

1 Like

To follow up: Cases like these are where the sample_prior_predictive and sample_posterior_predictive methods are particularly useful, since they use the underlying random() implementation.

Also, why can’t you just change the metropolis proposals?

class ZIMetropolis(pm.Metropolis):
    def __init__(self, *args, **kwargs):
        self.q0_uncensored = 0
        super().__init__(*args, **kwargs)
        
    def astep(self, q0):
        prev_accept = self.accepted
        if q0 == 0:
            q0 = self.q0_uncensored
        else:
            self.q0_uncensored = q0
        q_new, stats = super().astep(q0)
        self.accepted = prev_accept
        Q = np.random.binomial(1, 0.9, size=q_new.shape)
        accept = self.delta_logp(q_new, q0)
        q_new, accepted = pm.arraystep.metrop_select(accept, Q*q_new, q0)
        self.accepted += accepted
        stats = {
            'tune': self.tune,
            'accept': np.exp(accept),
        }
        return q_new, [stats]
        
        
with pm.Model() as mod:
    alpha = 0.15
    mu = 2
    sd = 0.2
    ziln = ZeroInflatedLogNormal('X', alpha, mu, sd)
    tr = pm.sample(2000, chains=12, cores=2, tune=50, step=ZIMetropolis())

image

Oh. Duh. Because a point-mass (even \epsilon \delta(x-x_0) ) doesn’t really make sense in the MH acceptance calculation; so you get this even for alpha=0.

Edit: Oh and the point mass has to be at -\infty for the lognormal since the proposals are on the latent space. So scratch this idea.