Seeds: Random effect logistic regression

Parikshit_Mukherjee · March 27, 2022, 5:18pm

I was wondering if we can add median to the trace summary currently it shows mean and sd only

cluhmann · March 27, 2022, 7:34pm

You can see how to add custom statistics to the summary here. For a quick solution, you can just do this:

idata = pm.sample(return_inferencedata = True)
print(idata.posterior.median())

Mohan_Nayak · March 27, 2022, 7:51pm

@cluhmann I have a simple doubt.

Below is the data given.

# germinated seeds
r =  np.array([10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3])

# total seeds
n =  np.array([39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7])

# seed type
x1 = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

# root type
x2 = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

# number of plates
N =  21

In the MULTIBUGS sample code given, they are looping the data. Do we need to do the same thing in pymc3 ? I am thinking model does for us but again I am seeing in API something like ‘pm.Data’. Now I am confused…

for (i in 1:N) {
         r[i] ~ dbin(p[i], n[i])
         b[i] ~ dnorm(0, tau)
         logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i]
                     + alpha12 * x1[i] * x2[i] + b[i]

Please advise.

cluhmann · March 27, 2022, 7:59pm

PyMC uses “vectorized” computation and thus avoids the need for loops.

The pm.Data object allows for data (e.g., a numpy array) to wrapped in a data “container”. This allows you to do things like replace your old data with new data just by calling pm.set_data().

Mohan_Nayak · March 27, 2022, 8:09pm

I got it. Thanks for the explanation !

Parikshit_Mukherjee · March 28, 2022, 4:00pm

Hi @cluhmann
I am trying to use my model to plot the predictive error with below code but I see some weird probability values in Y axis, kindly advice on this

import matplotlib.pyplot as plt

predictors_out_of_sample = np.random.normal(0,1,size=21).astype('int64')
outcomes_out_of_sample = np.random.binomial( 1,predictors_out_of_sample.mean().astype('int64'),size=21)

with m:
    # update values of predictors:
    pm.set_data({"data": predictors_out_of_sample})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["obs"], random_seed=15
    )
    model_preds = posterior_predictive["obs"]

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [predictors_out_of_sample, predictors_out_of_sample],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    predictors_out_of_sample,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=predictors_out_of_sample,
    y=outcomes_out_of_sample,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# true probabilities:
x = np.linspace(predictors_out_of_sample.min() - 0.1, predictors_out_of_sample.max() + 0.1,num=21)
ax.plot(
    x,
    predictors_out_of_sample,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);

Kindly advice on this

ckrapu · March 28, 2022, 6:23pm

Those aren’t posterior values of a probability, they’re posterior samples of counts ranging from 0 to n. To get the samples of p, I would add a line to your original model statement along the lines of

pm.Deterministic('p', p)

and then extract its values from the trace.

Parikshit_Mukherjee · March 28, 2022, 7:42pm

@cluhmann
ok I tried as below ,I can see the p values for all n while printing the summary of trace, but in the end I see error as
TypeError Traceback (most recent call last)
in ()
54 color="#565C6C",
55 alpha=0.8,
—> 56 label=“True prob.”,
57 )
58

3 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/cbook/init.py in _check_1d(x)
1323 dimension; leaves everything else untouched.
1324 ‘’’
→ 1325 if not hasattr(x, ‘shape’) or len(x.shape) < 1:
1326 return np.atleast_1d(x)
1327 else:

TypeError: object of type ‘TensorVariable’ has no len()

import matplotlib.pyplot as plt

predictors_out_of_sample = np.random.normal(0,1,size=21).astype('int64')
outcomes_out_of_sample = np.random.binomial( 1,predictors_out_of_sample.mean().astype('int64'),size=21)

with m:
    # update values of predictors:
    pm.set_data({"data": n.astype('int64')})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["p"], random_seed=15
    )
    model_preds = posterior_predictive["p"]

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [n, n],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    n,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=n,
    y=r/n,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# # true probabilities:
x = np.linspace(n.min() - 0.1, n.max() + 0.1,num=21)
ax.plot(
    n,
    p,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);

ckrapu · March 28, 2022, 8:04pm

You’re trying to plot a TensorVariable, which won’t work. You need to extract those values from the posterior predictive trace first, if they are in there.

Parikshit_Mukherjee · March 28, 2022, 8:19pm

Hi @ckrapu Thanks for the hint I am doing something like this but still see erronous plots.Not sure where I am going wrong

ax.plot(
    n,
    ptrace1["p"][1],
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

@cluhmann @ckrapu I see the curve as below, does it look correct

Parikshit_Mukherjee · March 29, 2022, 2:38pm

Parikshit_Mukherjee · March 29, 2022, 3:19pm

Tried this as well

import matplotlib.pyplot as plt



with m:
    # update values of predictors:
    pm.set_data({"data": n.astype('int64')})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["p"], random_seed=15
    )
    model_preds = posterior_predictive["p"]
p_values = np.zeros(shape=21)

for l in range (0,21):
 p_values[l]=(ptrace1["p"][1][l].mean())

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [n, n],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    n,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=n,
    y=r/n,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# # true probabilities:
# x = np.linspace(n.min() - 0.1, n.max() + 0.1,num=21)
ax.plot(
    n,
    p_values,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);

Topic		Replies	Views
Seeding issues when using model with Dirichlet and Gamma Questions	2	562	November 21, 2018
How to set a seed for pm.sample? v5 modeling	1	1945	February 28, 2023
New To PyMC3 \| Logistic Regression - Bug Questions bug	4	785	November 4, 2020
NUTS sampling doesn't work Questions	9	3015	December 21, 2020
Results not fully reproducible Questions bug	1	743	May 10, 2022

Seeds: Random effect logistic regression

Related topics