I’m trying to sample from a learned mixture of zeros and lognormal values. So far, none of my solutions have worked. I’ve generated a synthetic set of data for testing, and have all the code to work with it available in this gist of a notebook.
To summarize my issues so far: Mixtures of lognormal throw “Bad initial energy” errors if the data contains non-positive values, even if those values are supported by another distribution. That’s easy enough to fix in my case: I just add a small value to the data. The problem is that this still throws “bad initial energy” errors, just later in the sampling process.
with pm.Model() as ln_model_test:
mu_ln = pm.Normal("mu_ln", mu=LOGMEAN, sd=LOGSTD)
sd_ln = pm.InverseGamma("sd_ln", mu=LOGSTD, sd=LOGSTD)
nonzero = pm.Lognormal.dist(mu=mu_ln, sd=sd_ln)
zero = pm.Constant.dist(DELTA)
w = pm.Dirichlet('w', a=np.array([1,1]))
log_obs = pm.Mixture("log_obs", w=w, comp_dists=[zero, nonzero], observed=samples+DELTA)
log_dist = pm.Mixture("log_dist", w=w, comp_dists=[zero, nonzero])
ln_trace = pm.sample(1000, tune=5000, cores=7)
So I’ve abandoned lognormal, and just take the log of the data:
with pm.Model() as model_test:
mu = pm.Normal("mu", mu=LOGMEAN, sd=LOGSTD/len(samples_log))
sd = pm.InverseGamma("sd", mu=LOGSTD, sd=LOGSTD)
nonzero = pm.Normal.dist(mu=mu, sd=sd)
zero = pm.Constant.dist(np.log(0+DELTA))
w = pm.Dirichlet('w', a=np.array([1,1]))
obs = pm.Mixture("obs", w=w, comp_dists=[zero, nonzero], observed=samples_log)
output = pm.Mixture("output", w=w, comp_dists=[zero, nonzero])
dist = pm.Deterministic("dist", np.exp(output) - DELTA)
trace = pm.sample(1000, tune=5000, cores=4)
This works well enough to learn the mixture and lognormal parameters, but when we sample from dist
, it only samples from the normal distribution:
(peach original, blue posterior samples).
So it seems like there are two issues:
- Lognormal is not behaving as I expect it to
- Sampling from mixture distributions does not work as I expect.
Any advice on this?