Seeds: Random effect logistic regression

Mohan_Nayak · March 24, 2022, 9:33pm

https://www.multibugs.org/examples/latest/Seeds.html

I am trying to solve this example using PyMC3 but getting mean, SD 0. For Sampling I have tried HamiltonMC and Metropolis.

Any hint will be appreciated ?

cluhmann · March 25, 2022, 12:23am

Welcome!

I would suggest checking out this notebook for an example of logistic regression and this notebook or possibly this one for an example of how to build in “random effects”. If you have specific questions, feel free to post them.

Mohan_Nayak · March 25, 2022, 4:02am

Thanks @cluhmann for your response.

Basically I have tried HamiltonMC and Variational Inference. Below is the code, but I am not getting satisfactory result. Probably I am not using correct hyperparameter ?

I am trying to solve seeds problem Seeds. In my case, mean and sd are 0 and no where close to the numbers it should have.

Seeds

cluhmann · March 25, 2022, 4:41am

I think the biggest issue is that the SDs of your priors are very small. Things seem to run ok once setting them to 1.0.

As a side note, you can also use use the built in invlogit() if you want to skip doing some of the intermediate calculations by hand:

obs = pm.Binomial('obs',
                  n,
                  pm.math.invlogit(logit_p),
                  observed=r)

Mohan_Nayak · March 25, 2022, 5:01am

Yes, I tried changing sd before and was not getting 0 but we have consider this while using model.

     alpha0 ~ dnorm(0.0,1.0E-6)
     alpha1 ~ dnorm(0.0,1.0E-6)
     alpha2 ~ dnorm(0.0,1.0E-6)
     alpha12 ~ dnorm(0.0,1.0E-6)
     sigma ~ dunif(0,10)
     tau <- 1 / pow(sigma, 2)

The example considered here Seeds is done in R and tried with 2 initialization values and never got 0. So I am thinking I am doing some mistake or missing some hyperparameter in sample function.

I am not sure if R does it better.

cluhmann · March 25, 2022, 5:22am

I’m not sure what you mean by “getting zero”.

I saw that the small (1.0E-6) SDs in the model you referenced. Maybe try increasing the number of tuning steps would help (maybe 5,000 or 10,000?).

Mohan_Nayak · March 25, 2022, 6:11am

oh I am sorry for not being specific. I meant, mean and variance in my case was always zero. When you said, try changing variance. That’s what my answer was that by changing sd to 1, I stopped getting zero as mean and variance.

As per you advice, I tried with 5000 & 10000 but still no luck.

Luckily I found a sample python implementation https://github.com/aflaxman/pymc-examples/blob/master/seeds_re_logistic_regression_pymc3.ipynb using pymc3. In this, the author used almost same mean and variance as defined in multibugs seeds example and got non-zero mean and variance. But when I ran same code, I am getting mean and variance as 0.

I am doing something wrong somewhere for sure.

Mohan_Nayak · March 25, 2022, 9:25pm

@cluhmann - As per your advice, I have changed prior sd to 1 and can see that the mean & variance are approximately matching with Multibugs SEEDS example.
What reason I should write while modifying standard deviation ? As I am completely using a different model from original setup defined now.
What I meant is how to find if we should use this distribution and initial values for alphas, beta, Sigma etc. Pls advice.

cluhmann · March 25, 2022, 9:31pm

The results on that page suggest that some of the posterior estimates (e.g., alpha2) are very far from zero. So constructing a prior with a mean of 0 and an SD of 1e-6 suggest that that those results are not very credible a priori. Another way to think about it is that those priors suggest that the observed data is (nearly) impossible. Once you set the SD to 1, the data is much more plausible as far as the model is concerned.

Mohan_Nayak · March 25, 2022, 9:48pm

That’s absolutely make sense. Thanks a zillion.

I am still in dilema how in R, they are getting these values with small sd.

Sorry, one last question, in multibugs seed example, I see there is one point where values are getting initilized as below, I m planning to try this now. How we do that in pymc3 ?

Inits for chain 1

list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0, sigma = 8)

Inits for chain 2

list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0, sigma = 1,beta = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))

cluhmann · March 25, 2022, 10:09pm

You should be able to do something like this:

pm.sample(start={'alpha_0':0, 
                 'alpha_1':0, 
                 'alpha_2':0, 
                 'alpha_12':0, 
                 # etc
                }
         )

ckrapu · March 25, 2022, 11:57pm

My understanding was that the parameters to dnorm are in terms of mean and precision instead of variance/as.

Mohan_Nayak · March 26, 2022, 12:35am

@ckrapu - wow, thanks for the pointer. I just went through the MULTIBUG document and found that it is not sd but tau. you are absolutely correct and thanks for jumping and being a savior.

In the BUGS language it is used as
x ~ dnorm(mu, tau)

@cluhmann - I am sorry, I got confused and thought this is R code where as it is Multibug specific code. I came to know after running the example in MULTIBUG IDE(Tutorial)

As soon as I changed my code to below in pymc3, I can see value approximately matching to the result in MULTIBUGS.
alpha_0 = pm.Normal(‘alpha_0’, mu=0., tau=1e-6)

Let me try few things and might seek some help.

Thanks a zillion again @cluhmann and @ckrapu

Parikshit_Mukherjee · March 26, 2022, 11:35am

Folks I am facing similar issue, my code (@cluhmann and @ckrapu

import pymc3 as pm

import arviz as az

# ri~ Binomial(pi, ni)

#

# logit(pi) = α0 + α1x1i + α2x2i + α12x1ix2i + bi

#

# bi~ Normal(0, τ)

import numpy as np, pymc3 as pm, theano.tensor as T

# germinated seeds

r =  np.array([10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3])

# total seeds

n =  np.array([39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7])

# seed type

x1 = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

# root type

x2 = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

# number of plates

N =  x1.shape[0]

import math

with pm.Model() as m:

    ### hyperpriors

    sigma = pm.Uniform('sigma',lower=0.,upper=10.)

    tau =1/pow(sigma,2)

 

    ### parameters

    # fixed effects

    alpha_0 = pm.Normal('alpha_0', mu=0.,tau=1e-6)

    alpha_1 = pm.Normal('alpha_1', mu=0.,tau=1e-6)

    alpha_2 = pm.Normal('alpha_2', mu=0.,tau=1e-6)

    alpha_12 = pm.Normal('alpha_12', mu=0.,tau=1e-6)

    # random effect

    b = pm.Normal('b', 0., tau, shape=(N,))

    # expected parameter

    logit_p = (alpha_0 + alpha_1 * x1 + alpha_2 * x2 + alpha_12 * x1 * x2 + b)

    p = T.exp(logit_p) / (1 + T.exp(logit_p))

    ### likelihood

    obs = pm.Binomial('obs', n, p, observed=r)

n = 10000

with m:

    start = pm.find_MAP({'sigma':8., 'alpha_0': 0., 'alpha_1': 0., 'alpha_2': 0., 'alpha_12': 0.})

    step = pm.HamiltonianMC(scaling=start)

    ptrace = pm.sample(n, step, start, progressbar=False)

    start = pm.find_MAP({'sigma':1., 'alpha_0': 0., 'alpha_1': 0., 'alpha_2': 0., 'alpha_12': 0.,'b': np.zeros(N)})

    step = pm.HamiltonianMC(scaling=start)

    ptrace = pm.sample(n, step, start, progressbar=False)

    

burn = 1000

pm.summary(ptrace[burn:])

It is nowhere close to actual params. Any help appreciated

ckrapu · March 26, 2022, 2:38pm

You shouldn’t be using the scaling keyword argument with the values you have. I’m surprised it lets you sample at all because that argument sets the diagonal terms of a preconditioning matrix. If you set some of those values to zero, you are forcing the sampler to make moves of size zero, going nowhere. I recommend getting rid that and using NUTS with no additional arguments.

cluhmann · March 26, 2022, 2:38pm

This should probably be tau=tau. Right now I think this is using tau as the SD.

Parikshit_Mukherjee · March 26, 2022, 6:31pm

Hi @ckrapu @cluhmann
Thanks a lot for the help. It works.
One last query I wanted to use this model to predict ,I am following this

https://docs.pymc.io/en/v3/pymc-examples/examples/diagnostics_and_criticism/posterior_predictive.html

but not able to identify clearly how to implement in my case

we are looking for a simple example

cluhmann · March 26, 2022, 6:33pm

Can you be more specific?

Parikshit_Mukherjee · March 26, 2022, 6:38pm

Hi @cluhmann

Basically I am referring to the [Seeds: Random effect logistic regression] model that I implemented using pymc3 I want to use it for prediction now. Basically I want to use the learnt posterior distribution to make prediction on either held out test data points or training data points and find the prediction error (if any)

cluhmann · March 26, 2022, 6:46pm

For “predicting” the training data, you can use sample_prior_predictive() as illustrated in this notebook. For predicting new data, you can check out the answer in this thread for how to use sample_prior_predictive() and set_data() to do so.

Topic		Replies	Views
Seeding issues when using model with Dirichlet and Gamma Questions	2	562	November 21, 2018
How to set a seed for pm.sample? v5 modeling	1	1944	February 28, 2023
New To PyMC3 \| Logistic Regression - Bug Questions bug	4	785	November 4, 2020
NUTS sampling doesn't work Questions	9	3015	December 21, 2020
Results not fully reproducible Questions bug	1	743	May 10, 2022

Seeds: Random effect logistic regression

Inits for chain 1

Inits for chain 2

Related topics