Question on how to model spike and slab priors

xing_wu · June 15, 2020, 6:29am

Hi,
I am new to the pymc3 community, and I have a question on how to apply spike and slab prior on variables.

Here is my model:

Y = X\beta + \epsilon

\beta_i \sim (1-\gamma_i)N(0,0.001^2) + \gamma_i N(0,\sigma_1^2)

\gamma_i \sim \text{Bernoulli }(\pi)

\pi \sim \text{Beta}(1,1)

\epsilon \sim N(0,\sigma_e^2)

\sigma_1 \sim \text{HalfCauchy}(5)

\sigma_e \sim \text{HalfCauchy}(5)

\beta and \gamma are vectors with length = n, in other words, length(beta) == length(gamma) ==n .

I cannot find examples showing mixture on the variables, and I am not sure what I wrote is correct or not.

here is my pymc3 model:

sigma_1 = pm3.HalfCauchy('sigma_1',5)
sigma_e= pm3.HalfCauchy('sigma_e',5)
pi = pm3.Beta('pi',1,1)
gamma = pm3.Bernoulli('gamma',p = pi,shape = n)
mixture_sd = pm3.math.switch(gamma > 0.5, sigma_1, 0.001)
beta = pm3.Normal('beta',mu = 0,sigma = mixture_sd, shape = n)
mu = tt.dot(X,beta)
likelihood = pm3.Normal('y',mu = mu, sigma = sigma_e,observed = y)

The mean(posterior) for \pi and \gamma are too small to be true. The program almost always reports that all of the \beta_i comes from N(0,0.001^2).

I would really appreciate it if someone can help me on this.

Thanks

Xing

ld92 · June 15, 2020, 6:41am

Interesting, I posted a similar question yesterday:

What you specify, can be pretty easily modeled in PyMC3:


def beta_spike_slab(shape,spike):
inclusion_prop = 0.05

beta_spike = pm.Normal(‘beta_spike’, 0, spike, shape=shape)
beta_slab = pm.Normal(‘beta_slab’, 10, shape=shape)
gamma = pm.Bernoulli(‘gamma’, inclusion_prop, shape=shape)
beta_spike_slab = pm.Deterministic(‘beta_spike_slab’,(beta_spike * (1-gamma)) + ((beta_slab * gamma)))

return beta_spike_slab

but this leads to slow sampling, because it is needed to marginalize over the bernoulli distribution.
I have not found a way to do this without divergence/sampling problems, so if anyone has any ideas, I would love to hear them!

xing_wu · June 15, 2020, 6:56am

Thank you very much. In your code, did you mean

beta_slab = pm.Normal(‘beta_slab’, 0, 10, shape=shape)

In addition, did you spot anything wrong with my implementation?

ld92 · June 15, 2020, 9:36am

Yes, that is what I meant.
I see youve updated your post with your model.

What are your sampling options? Because you cannot use NUTS for discrete variables, you need to (dramatically) increase your tuning steps. When I choose


trace = pm.sample(1000, tune=12000, nuts={"target_accept": 0.99})

with (this is adapted with your code)


sigma_1 = pm.HalfNormal('sigma_1',10, shape=shape)
pi = pm.Beta('pi',2,7)
gamma = pm.Bernoulli('gamma',p = pi,shape = shape)
mixture_sd = pm.Deterministic('mixture_sd', pm.math.switch(gamma > 0.5, sigma_1, 0.001))
beta = pm.Normal('beta',mu = 0,sigma = mixture_sd, shape = shape)

I do get the right beta coefficients with my toy dataset.

And I think the next step would be to not use match.switch, but replace it with a continous function like tt.net.sigmoid, to improve sampling, see Frequently Asked Questions

xing_wu · June 15, 2020, 9:45pm

I used NUTS for the continues variables and metropolis for gamma. I found it to be very slow.

Dushyant_Sahoo · March 26, 2021, 5:47pm

A continuous version is much faster than the discrete one. Here is the code-

def conti_spike_slab(shape):
    mu_hat = 0
    sigma_hat = 10
    tau = 5
    lambda_hat = pm.Normal('lambda_hat', mu = mu_hat, sigma = sigma_hat, shape = shape)
    spike_raw = pm.Normal('spike_raw', mu = 0, sigma = 1, shape = shape)
    spike = pm.Deterministic('spike',tau*spike_raw*pm.invlogit(lambda_hat))
    return spike

Let me know what you guys think.

ckrapu · March 26, 2021, 6:27pm

Perhaps the main point of the spike and slab prior is that it exactly zeros out many coefficients in the posterior samples. If you move to a continuous prior, you give that up. There are better continuous priors favoring many sparse entries. The horseshoe prior is one of them.

chartl · March 26, 2021, 7:24pm

Unless you’re drawing a single sample, having exact 0s is not particularly useful. And if you want the MAP why are you bothering with sampling at all? Otherwise the general process is “take the trace and ask how often this coefficient was 0” – which can be effectively read off from lambda_hat anyway.

Dushyant_Sahoo · March 26, 2021, 9:18pm

I recently read a paper- https://royalsocietypublishing.org/doi/pdf/10.1098/rsif.2018.0572 which claims that their continuous version of spike and slab outperforms horseshoe prior, it is based on iverse logit which I implemented above.

ckrapu · March 27, 2021, 5:03am

Interesting. I stand corrected with regard to my statement about the horseshoe - I look forward to taking a look at that paper.

jonsedar · March 27, 2021, 7:43am

That’s a really interesting paper, will have to try it later, thanks!

Topic		Replies	Views
Spike and slab logsitic regression - marginalizing over bernoulli distribution Questions	3	1821	June 22, 2020
PyMC3-Sampling not taking place for the entire range of values between the given bounds of a Uniform prior Questions	7	1613	January 14, 2018
KeyError Setting Mixture Proportions for Mixture Model Questions	26	1514	May 13, 2019
Sampler occasionally gets static in pymc3.1, but not in pymc3.0 Questions development	19	2288	July 5, 2018
Struggling with Beta mixture models Questions	3	1883	February 2, 2019

Question on how to model spike and slab priors

Related topics