Question on how to model spike and slab priors

Hi,
I am new to the pymc3 community, and I have a question on how to apply spike and slab prior on variables.

Here is my model:

Y = X\beta + \epsilon
\beta_i \sim (1-\gamma_i)N(0,0.001^2) + \gamma_i N(0,\sigma_1^2)
\gamma_i \sim \text{Bernoulli }(\pi)
\pi \sim \text{Beta}(1,1)
\epsilon \sim N(0,\sigma_e^2)
\sigma_1 \sim \text{HalfCauchy}(5)
\sigma_e \sim \text{HalfCauchy}(5)

\beta and \gamma are vectors with length = n, in other words, length(beta) == length(gamma) ==n .

I cannot find examples showing mixture on the variables, and I am not sure what I wrote is correct or not.

here is my pymc3 model:

sigma_1 = pm3.HalfCauchy('sigma_1',5)
sigma_e= pm3.HalfCauchy('sigma_e',5)
pi = pm3.Beta('pi',1,1)
gamma = pm3.Bernoulli('gamma',p = pi,shape = n)
mixture_sd = pm3.math.switch(gamma > 0.5, sigma_1, 0.001)
beta = pm3.Normal('beta',mu = 0,sigma = mixture_sd, shape = n)
mu = tt.dot(X,beta)
likelihood = pm3.Normal('y',mu = mu, sigma = sigma_e,observed = y)

The mean(posterior) for \pi and \gamma are too small to be true. The program almost always reports that all of the \beta_i comes from N(0,0.001^2).

I would really appreciate it if someone can help me on this.

Thanks

Xing

Interesting, I posted a similar question yesterday:

What you specify, can be pretty easily modeled in PyMC3:

def beta_spike_slab(shape,spike):

inclusion_prop = 0.05
beta_spike = pm.Normal(‘beta_spike’, 0, spike, shape=shape)

beta_slab = pm.Normal(‘beta_slab’, 10, shape=shape)

gamma = pm.Bernoulli(‘gamma’, inclusion_prop, shape=shape)

beta_spike_slab = pm.Deterministic(‘beta_spike_slab’,(beta_spike * (1-gamma)) + ((beta_slab * gamma)))
return beta_spike_slab

but this leads to slow sampling, because it is needed to marginalize over the bernoulli distribution.
I have not found a way to do this without divergence/sampling problems, so if anyone has any ideas, I would love to hear them!

Thank you very much. In your code, did you mean

beta_slab = pm.Normal(‘beta_slab’, 0, 10, shape=shape)

In addition, did you spot anything wrong with my implementation?

Yes, that is what I meant.
I see youve updated your post with your model.

What are your sampling options? Because you cannot use NUTS for discrete variables, you need to (dramatically) increase your tuning steps. When I choose

trace = pm.sample(1000, tune=12000, nuts={"target_accept": 0.99})

with (this is adapted with your code)

sigma_1 = pm.HalfNormal('sigma_1',10, shape=shape)
pi = pm.Beta('pi',2,7)
gamma = pm.Bernoulli('gamma',p = pi,shape = shape)
mixture_sd = pm.Deterministic('mixture_sd', pm.math.switch(gamma > 0.5, sigma_1, 0.001))
beta = pm.Normal('beta',mu = 0,sigma = mixture_sd, shape = shape)

I do get the right beta coefficients with my toy dataset.

And I think the next step would be to not use match.switch, but replace it with a continous function like tt.net.sigmoid, to improve sampling, see Frequently Asked Questions

I used NUTS for the continues variables and metropolis for gamma. I found it to be very slow.