Question on how to model spike and slab priors

Hi,
I am new to the pymc3 community, and I have a question on how to apply spike and slab prior on variables.

Here is my model:

Y = X\beta + \epsilon
\beta_i \sim (1-\gamma_i)N(0,0.001^2) + \gamma_i N(0,\sigma_1^2)
\gamma_i \sim \text{Bernoulli }(\pi)
\pi \sim \text{Beta}(1,1)
\epsilon \sim N(0,\sigma_e^2)
\sigma_1 \sim \text{HalfCauchy}(5)
\sigma_e \sim \text{HalfCauchy}(5)

\beta and \gamma are vectors with length = n, in other words, length(beta) == length(gamma) ==n .

I cannot find examples showing mixture on the variables, and I am not sure what I wrote is correct or not.

here is my pymc3 model:

sigma_1 = pm3.HalfCauchy('sigma_1',5)
sigma_e= pm3.HalfCauchy('sigma_e',5)
pi = pm3.Beta('pi',1,1)
gamma = pm3.Bernoulli('gamma',p = pi,shape = n)
mixture_sd = pm3.math.switch(gamma > 0.5, sigma_1, 0.001)
beta = pm3.Normal('beta',mu = 0,sigma = mixture_sd, shape = n)
mu = tt.dot(X,beta)
likelihood = pm3.Normal('y',mu = mu, sigma = sigma_e,observed = y)

The mean(posterior) for \pi and \gamma are too small to be true. The program almost always reports that all of the \beta_i comes from N(0,0.001^2).

I would really appreciate it if someone can help me on this.

Thanks

Xing

Interesting, I posted a similar question yesterday:

What you specify, can be pretty easily modeled in PyMC3:

def beta_spike_slab(shape,spike):

inclusion_prop = 0.05
beta_spike = pm.Normal(‘beta_spike’, 0, spike, shape=shape)

beta_slab = pm.Normal(‘beta_slab’, 10, shape=shape)

gamma = pm.Bernoulli(‘gamma’, inclusion_prop, shape=shape)

beta_spike_slab = pm.Deterministic(‘beta_spike_slab’,(beta_spike * (1-gamma)) + ((beta_slab * gamma)))
return beta_spike_slab

but this leads to slow sampling, because it is needed to marginalize over the bernoulli distribution.
I have not found a way to do this without divergence/sampling problems, so if anyone has any ideas, I would love to hear them!

Thank you very much. In your code, did you mean

beta_slab = pm.Normal(‘beta_slab’, 0, 10, shape=shape)

In addition, did you spot anything wrong with my implementation?

Yes, that is what I meant.
I see youve updated your post with your model.

What are your sampling options? Because you cannot use NUTS for discrete variables, you need to (dramatically) increase your tuning steps. When I choose

trace = pm.sample(1000, tune=12000, nuts={"target_accept": 0.99})

with (this is adapted with your code)

sigma_1 = pm.HalfNormal('sigma_1',10, shape=shape)
pi = pm.Beta('pi',2,7)
gamma = pm.Bernoulli('gamma',p = pi,shape = shape)
mixture_sd = pm.Deterministic('mixture_sd', pm.math.switch(gamma > 0.5, sigma_1, 0.001))
beta = pm.Normal('beta',mu = 0,sigma = mixture_sd, shape = shape)

I do get the right beta coefficients with my toy dataset.

And I think the next step would be to not use match.switch, but replace it with a continous function like tt.net.sigmoid, to improve sampling, see Frequently Asked Questions

I used NUTS for the continues variables and metropolis for gamma. I found it to be very slow.

A continuous version is much faster than the discrete one. Here is the code-

def conti_spike_slab(shape):
    mu_hat = 0
    sigma_hat = 10
    tau = 5
    lambda_hat = pm.Normal('lambda_hat', mu = mu_hat, sigma = sigma_hat, shape = shape)
    spike_raw = pm.Normal('spike_raw', mu = 0, sigma = 1, shape = shape)
    spike = pm.Deterministic('spike',tau*spike_raw*pm.invlogit(lambda_hat))
    return spike

Let me know what you guys think.

2 Likes

Perhaps the main point of the spike and slab prior is that it exactly zeros out many coefficients in the posterior samples. If you move to a continuous prior, you give that up. There are better continuous priors favoring many sparse entries. The horseshoe prior is one of them.

1 Like

Unless you’re drawing a single sample, having exact 0s is not particularly useful. And if you want the MAP why are you bothering with sampling at all? Otherwise the general process is “take the trace and ask how often this coefficient was 0” – which can be effectively read off from lambda_hat anyway.

I recently read a paper- https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2018.0572 which claims that their continuous version of spike and slab outperforms horseshoe prior, it is based on iverse logit which I implemented above.

1 Like

Interesting. I stand corrected with regard to my statement about the horseshoe - I look forward to taking a look at that paper.

1 Like

That’s a really interesting paper, will have to try it later, thanks!