# Question on how to model spike and slab priors

Hi,
I am new to the pymc3 community, and I have a question on how to apply spike and slab prior on variables.

Here is my model:

Y = X\beta + \epsilon
\beta_i \sim (1-\gamma_i)N(0,0.001^2) + \gamma_i N(0,\sigma_1^2)
\gamma_i \sim \text{Bernoulli }(\pi)
\pi \sim \text{Beta}(1,1)
\epsilon \sim N(0,\sigma_e^2)
\sigma_1 \sim \text{HalfCauchy}(5)
\sigma_e \sim \text{HalfCauchy}(5)

\beta and \gamma are vectors with length = n, in other words, length(beta) == length(gamma) ==n .

I cannot find examples showing mixture on the variables, and I am not sure what I wrote is correct or not.

here is my pymc3 model:

sigma_1 = pm3.HalfCauchy('sigma_1',5)
sigma_e= pm3.HalfCauchy('sigma_e',5)
pi = pm3.Beta('pi',1,1)
gamma = pm3.Bernoulli('gamma',p = pi,shape = n)
mixture_sd = pm3.math.switch(gamma > 0.5, sigma_1, 0.001)
beta = pm3.Normal('beta',mu = 0,sigma = mixture_sd, shape = n)
mu = tt.dot(X,beta)
likelihood = pm3.Normal('y',mu = mu, sigma = sigma_e,observed = y)


The mean(posterior) for \pi and \gamma are too small to be true. The program almost always reports that all of the \beta_i comes from N(0,0.001^2).

I would really appreciate it if someone can help me on this.

Thanks

Xing

Interesting, I posted a similar question yesterday:

What you specify, can be pretty easily modeled in PyMC3:

 def beta_spike_slab(shape,spike): inclusion_prop = 0.05 beta_spike = pm.Normal(âbeta_spikeâ, 0, spike, shape=shape) beta_slab = pm.Normal(âbeta_slabâ, 10, shape=shape) gamma = pm.Bernoulli(âgammaâ, inclusion_prop, shape=shape) beta_spike_slab = pm.Deterministic(âbeta_spike_slabâ,(beta_spike * (1-gamma)) + ((beta_slab * gamma))) return beta_spike_slab 

but this leads to slow sampling, because it is needed to marginalize over the bernoulli distribution.
I have not found a way to do this without divergence/sampling problems, so if anyone has any ideas, I would love to hear them!

Thank you very much. In your code, did you mean

beta_slab = pm.Normal(âbeta_slabâ, 0, 10, shape=shape)


In addition, did you spot anything wrong with my implementation?

Yes, that is what I meant.

What are your sampling options? Because you cannot use NUTS for discrete variables, you need to (dramatically) increase your tuning steps. When I choose

 trace = pm.sample(1000, tune=12000, nuts={"target_accept": 0.99}) 

 sigma_1 = pm.HalfNormal('sigma_1',10, shape=shape) pi = pm.Beta('pi',2,7) gamma = pm.Bernoulli('gamma',p = pi,shape = shape) mixture_sd = pm.Deterministic('mixture_sd', pm.math.switch(gamma > 0.5, sigma_1, 0.001)) beta = pm.Normal('beta',mu = 0,sigma = mixture_sd, shape = shape) 

I do get the right beta coefficients with my toy dataset.

And I think the next step would be to not use match.switch, but replace it with a continous function like tt.net.sigmoid, to improve sampling, see Frequently Asked Questions

I used NUTS for the continues variables and metropolis for gamma. I found it to be very slow.

A continuous version is much faster than the discrete one. Here is the code-

def conti_spike_slab(shape):
mu_hat = 0
sigma_hat = 10
tau = 5
lambda_hat = pm.Normal('lambda_hat', mu = mu_hat, sigma = sigma_hat, shape = shape)
spike_raw = pm.Normal('spike_raw', mu = 0, sigma = 1, shape = shape)
spike = pm.Deterministic('spike',tau*spike_raw*pm.invlogit(lambda_hat))
return spike


Let me know what you guys think.

2 Likes

Perhaps the main point of the spike and slab prior is that it exactly zeros out many coefficients in the posterior samples. If you move to a continuous prior, you give that up. There are better continuous priors favoring many sparse entries. The horseshoe prior is one of them.

1 Like

Unless youâre drawing a single sample, having exact 0s is not particularly useful. And if you want the MAP why are you bothering with sampling at all? Otherwise the general process is âtake the trace and ask how often this coefficient was 0â â which can be effectively read off from lambda_hat anyway.

I recently read a paper- https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2018.0572 which claims that their continuous version of spike and slab outperforms horseshoe prior, it is based on iverse logit which I implemented above.

1 Like

Interesting. I stand corrected with regard to my statement about the horseshoe - I look forward to taking a look at that paper.

1 Like

Thatâs a really interesting paper, will have to try it later, thanks!