Seeds: Random effect logistic regression

@ckrapu - wow, thanks for the pointer. I just went through the MULTIBUG document and found that it is not sd but tau. you are absolutely correct and thanks for jumping and being a savior.

In the BUGS language it is used as
x ~ dnorm(mu, tau)

@cluhmann - I am sorry, I got confused and thought this is R code where as it is Multibug specific code. I came to know after running the example in MULTIBUG IDE(Tutorial)

As soon as I changed my code to below in pymc3, I can see value approximately matching to the result in MULTIBUGS.
alpha_0 = pm.Normal(‘alpha_0’, mu=0., tau=1e-6)

Let me try few things and might seek some help.

Thanks a zillion again @cluhmann and @ckrapu

1 Like

Folks I am facing similar issue, my code (@cluhmann and @ckrapu

import pymc3 as pm

import arviz as az

# ri~ Binomial(pi, ni)

#

# logit(pi) = α0 + α1x1i + α2x2i + α12x1ix2i + bi

#

# bi~ Normal(0, τ)

import numpy as np, pymc3 as pm, theano.tensor as T

# germinated seeds

r =  np.array([10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3])

# total seeds

n =  np.array([39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7])

# seed type

x1 = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

# root type

x2 = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

# number of plates

N =  x1.shape[0]

import math

with pm.Model() as m:

    ### hyperpriors

    sigma = pm.Uniform('sigma',lower=0.,upper=10.)

    tau =1/pow(sigma,2)

 

    ### parameters

    # fixed effects

    alpha_0 = pm.Normal('alpha_0', mu=0.,tau=1e-6)

    alpha_1 = pm.Normal('alpha_1', mu=0.,tau=1e-6)

    alpha_2 = pm.Normal('alpha_2', mu=0.,tau=1e-6)

    alpha_12 = pm.Normal('alpha_12', mu=0.,tau=1e-6)

    # random effect

    b = pm.Normal('b', 0., tau, shape=(N,))

    # expected parameter

    logit_p = (alpha_0 + alpha_1 * x1 + alpha_2 * x2 + alpha_12 * x1 * x2 + b)

    p = T.exp(logit_p) / (1 + T.exp(logit_p))

    ### likelihood

    obs = pm.Binomial('obs', n, p, observed=r)

n = 10000

with m:

    start = pm.find_MAP({'sigma':8., 'alpha_0': 0., 'alpha_1': 0., 'alpha_2': 0., 'alpha_12': 0.})

    step = pm.HamiltonianMC(scaling=start)

    ptrace = pm.sample(n, step, start, progressbar=False)

    start = pm.find_MAP({'sigma':1., 'alpha_0': 0., 'alpha_1': 0., 'alpha_2': 0., 'alpha_12': 0.,'b': np.zeros(N)})

    step = pm.HamiltonianMC(scaling=start)

    ptrace = pm.sample(n, step, start, progressbar=False)

    

burn = 1000

pm.summary(ptrace[burn:])

It is nowhere close to actual params. Any help appreciated

You shouldn’t be using the scaling keyword argument with the values you have. I’m surprised it lets you sample at all because that argument sets the diagonal terms of a preconditioning matrix. If you set some of those values to zero, you are forcing the sampler to make moves of size zero, going nowhere. I recommend getting rid that and using NUTS with no additional arguments.

This should probably be tau=tau. Right now I think this is using tau as the SD.

Hi @ckrapu @cluhmann
Thanks a lot for the help. It works.
One last query I wanted to use this model to predict ,I am following this

https://docs.pymc.io/en/v3/pymc-examples/examples/diagnostics_and_criticism/posterior_predictive.html

but not able to identify clearly how to implement in my case

we are looking for a simple example

Can you be more specific?

Hi @cluhmann

Basically I am referring to the [Seeds: Random effect logistic regression] model that I implemented using pymc3 I want to use it for prediction now. Basically I want to use the learnt posterior distribution to make prediction on either held out test data points or training data points and find the prediction error (if any)

For “predicting” the training data, you can use sample_prior_predictive() as illustrated in this notebook. For predicting new data, you can check out the answer in this thread for how to use sample_prior_predictive() and set_data() to do so.

Sure Thanks @cluhmann

I was wondering if we can add median to the trace summary currently it shows mean and sd only

You can see how to add custom statistics to the summary here. For a quick solution, you can just do this:

idata = pm.sample(return_inferencedata = True)
print(idata.posterior.median())

@cluhmann I have a simple doubt.

Below is the data given.

# germinated seeds
r =  np.array([10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3])

# total seeds
n =  np.array([39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7])

# seed type
x1 = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

# root type
x2 = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

# number of plates
N =  21

In the MULTIBUGS sample code given, they are looping the data. Do we need to do the same thing in pymc3 ? I am thinking model does for us but again I am seeing in API something like ‘pm.Data’. Now I am confused…

for (i in 1:N) {
         r[i] ~ dbin(p[i], n[i])
         b[i] ~ dnorm(0, tau)
         logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i]
                     + alpha12 * x1[i] * x2[i] + b[i]

Please advise.

PyMC uses “vectorized” computation and thus avoids the need for loops.

The pm.Data object allows for data (e.g., a numpy array) to wrapped in a data “container”. This allows you to do things like replace your old data with new data just by calling pm.set_data().

I got it. Thanks for the explanation !

Hi @cluhmann
I am trying to use my model to plot the predictive error with below code but I see some weird probability values in Y axis, kindly advice on this

import matplotlib.pyplot as plt

predictors_out_of_sample = np.random.normal(0,1,size=21).astype('int64')
outcomes_out_of_sample = np.random.binomial( 1,predictors_out_of_sample.mean().astype('int64'),size=21)

with m:
    # update values of predictors:
    pm.set_data({"data": predictors_out_of_sample})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["obs"], random_seed=15
    )
    model_preds = posterior_predictive["obs"]

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [predictors_out_of_sample, predictors_out_of_sample],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    predictors_out_of_sample,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=predictors_out_of_sample,
    y=outcomes_out_of_sample,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# true probabilities:
x = np.linspace(predictors_out_of_sample.min() - 0.1, predictors_out_of_sample.max() + 0.1,num=21)
ax.plot(
    x,
    predictors_out_of_sample,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);

Kindly advice on this

Those aren’t posterior values of a probability, they’re posterior samples of counts ranging from 0 to n. To get the samples of p, I would add a line to your original model statement along the lines of

pm.Deterministic('p', p)

and then extract its values from the trace.

1 Like

@cluhmann
ok I tried as below ,I can see the p values for all n while printing the summary of trace, but in the end I see error as
TypeError Traceback (most recent call last)
in ()
54 color="#565C6C",
55 alpha=0.8,
—> 56 label=“True prob.”,
57 )
58

3 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/cbook/init.py in _check_1d(x)
1323 dimension; leaves everything else untouched.
1324 ‘’’
→ 1325 if not hasattr(x, ‘shape’) or len(x.shape) < 1:
1326 return np.atleast_1d(x)
1327 else:

TypeError: object of type ‘TensorVariable’ has no len()

import matplotlib.pyplot as plt

predictors_out_of_sample = np.random.normal(0,1,size=21).astype('int64')
outcomes_out_of_sample = np.random.binomial( 1,predictors_out_of_sample.mean().astype('int64'),size=21)

with m:
    # update values of predictors:
    pm.set_data({"data": n.astype('int64')})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["p"], random_seed=15
    )
    model_preds = posterior_predictive["p"]

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [n, n],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    n,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=n,
    y=r/n,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# # true probabilities:
x = np.linspace(n.min() - 0.1, n.max() + 0.1,num=21)
ax.plot(
    n,
    p,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);

You’re trying to plot a TensorVariable, which won’t work. You need to extract those values from the posterior predictive trace first, if they are in there.

1 Like

Hi @ckrapu Thanks for the hint I am doing something like this but still see erronous plots.Not sure where I am going wrong

ax.plot(
    n,
    ptrace1["p"][1],
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

@cluhmann @ckrapu I see the curve as below, does it look correct

Tried this as well

import matplotlib.pyplot as plt



with m:
    # update values of predictors:
    pm.set_data({"data": n.astype('int64')})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["p"], random_seed=15
    )
    model_preds = posterior_predictive["p"]
p_values = np.zeros(shape=21)

for l in range (0,21):
 p_values[l]=(ptrace1["p"][1][l].mean())

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [n, n],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    n,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=n,
    y=r/n,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# # true probabilities:
# x = np.linspace(n.min() - 0.1, n.max() + 0.1,num=21)
ax.plot(
    n,
    p_values,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);