I’m observing a continuous variable. The variable has some probability to be non-zero. The non-zero values are
expected to be normally distributed. Here’s an example:
np.random.seed(42)
true_p_non_zero = 0.3
true_mu = 1234.5
true_sd = 1.2
data = pd.DataFrame({'non_zero': np.random.rand(100) < true_p_non_zero})
y = np.random.randn(len(data)) * true_sd + true_mu
y[~data.non_zero] = 0
data['y']=y
My goal is to estimate the probability of being non-zero and the parameters of the normal distribution
The first part is easy:
with pm.Model() as model:
p_non_zero = pm.Uniform('p_non_zero', lower=0, upper=1)
non_zero = pm.Bernoulli('non_zero', p=p_non_zero, observed=data.non_zero)
trace = pm.sample(100, tune=50)
The problem is that I can’t figure out how to use the next step in this logic.
I tried the following code but it produces wrong estimates:
with pm.Model() as model:
p_non_zero = pm.Beta('p_non_zero', alpha=1, beta=1)
non_zero = pm.Bernoulli('non_zero', p=p_non_zero)
mu = pm.Uniform('mu', lower=1000, upper=2000)
sd = pm.Uniform('sd', lower=0, upper=5)
_mu = pm.math.switch(non_zero, 0, mu)
_sd = pm.math.switch(non_zero, 0, sd)
y = pm.Normal('y', mu=_mu, sd=_sd, observed=data['y'])
trace = pm.sample(500, tune=150)
The following approach works but it looks very un-general to me:
with pm.Model() as model:
sel_non_zero = data.y != 0
p_non_zero = pm.Uniform('p_non_zero', lower=0, upper=1)
non_zero = pm.Bernoulli('non_zero', p=p_non_zero, observed=sel_non_zero)
mu = pm.Uniform('mu', lower=1000, upper=2000)
sd = pm.Uniform('sd', lower=0, upper=5)
y = pm.Normal('y', mu=mu, sd=sd, observed=data.loc[~sel_non_zero]['y'])
trace = pm.sample(500, tune=150)
Again, this approach sort of works, but I suspect that there are better, or more elegant solutions.