Seeds: Random effect logistic regression

Sure Thanks @cluhmann

I was wondering if we can add median to the trace summary currently it shows mean and sd only

You can see how to add custom statistics to the summary here. For a quick solution, you can just do this:

idata = pm.sample(return_inferencedata = True)
print(idata.posterior.median())

@cluhmann I have a simple doubt.

Below is the data given.

# germinated seeds
r =  np.array([10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10, 8, 10, 8, 23, 0, 3, 22, 15, 32, 3])

# total seeds
n =  np.array([39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 51, 7])

# seed type
x1 = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

# root type
x2 = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

# number of plates
N =  21

In the MULTIBUGS sample code given, they are looping the data. Do we need to do the same thing in pymc3 ? I am thinking model does for us but again I am seeing in API something like ‘pm.Data’. Now I am confused…

for (i in 1:N) {
         r[i] ~ dbin(p[i], n[i])
         b[i] ~ dnorm(0, tau)
         logit(p[i]) <- alpha0 + alpha1 * x1[i] + alpha2 * x2[i]
                     + alpha12 * x1[i] * x2[i] + b[i]

Please advise.

PyMC uses “vectorized” computation and thus avoids the need for loops.

The pm.Data object allows for data (e.g., a numpy array) to wrapped in a data “container”. This allows you to do things like replace your old data with new data just by calling pm.set_data().

I got it. Thanks for the explanation !

Hi @cluhmann
I am trying to use my model to plot the predictive error with below code but I see some weird probability values in Y axis, kindly advice on this

import matplotlib.pyplot as plt

predictors_out_of_sample = np.random.normal(0,1,size=21).astype('int64')
outcomes_out_of_sample = np.random.binomial( 1,predictors_out_of_sample.mean().astype('int64'),size=21)

with m:
    # update values of predictors:
    pm.set_data({"data": predictors_out_of_sample})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["obs"], random_seed=15
    )
    model_preds = posterior_predictive["obs"]

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [predictors_out_of_sample, predictors_out_of_sample],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    predictors_out_of_sample,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=predictors_out_of_sample,
    y=outcomes_out_of_sample,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# true probabilities:
x = np.linspace(predictors_out_of_sample.min() - 0.1, predictors_out_of_sample.max() + 0.1,num=21)
ax.plot(
    x,
    predictors_out_of_sample,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);

Kindly advice on this

Those aren’t posterior values of a probability, they’re posterior samples of counts ranging from 0 to n. To get the samples of p, I would add a line to your original model statement along the lines of

pm.Deterministic('p', p)

and then extract its values from the trace.

1 Like

@cluhmann
ok I tried as below ,I can see the p values for all n while printing the summary of trace, but in the end I see error as
TypeError Traceback (most recent call last)
in ()
54 color="#565C6C",
55 alpha=0.8,
—> 56 label=“True prob.”,
57 )
58

3 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/cbook/init.py in _check_1d(x)
1323 dimension; leaves everything else untouched.
1324 ‘’’
→ 1325 if not hasattr(x, ‘shape’) or len(x.shape) < 1:
1326 return np.atleast_1d(x)
1327 else:

TypeError: object of type ‘TensorVariable’ has no len()

import matplotlib.pyplot as plt

predictors_out_of_sample = np.random.normal(0,1,size=21).astype('int64')
outcomes_out_of_sample = np.random.binomial( 1,predictors_out_of_sample.mean().astype('int64'),size=21)

with m:
    # update values of predictors:
    pm.set_data({"data": n.astype('int64')})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["p"], random_seed=15
    )
    model_preds = posterior_predictive["p"]

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [n, n],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    n,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=n,
    y=r/n,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# # true probabilities:
x = np.linspace(n.min() - 0.1, n.max() + 0.1,num=21)
ax.plot(
    n,
    p,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);

You’re trying to plot a TensorVariable, which won’t work. You need to extract those values from the posterior predictive trace first, if they are in there.

1 Like

Hi @ckrapu Thanks for the hint I am doing something like this but still see erronous plots.Not sure where I am going wrong

ax.plot(
    n,
    ptrace1["p"][1],
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

@cluhmann @ckrapu I see the curve as below, does it look correct

Tried this as well

import matplotlib.pyplot as plt



with m:
    # update values of predictors:
    pm.set_data({"data": n.astype('int64')})
    # use the updated values and predict outcomes and probabilities:
    posterior_predictive = pm.sample_posterior_predictive(
        ptrace1, var_names=["p"], random_seed=15
    )
    model_preds = posterior_predictive["p"]
p_values = np.zeros(shape=21)

for l in range (0,21):
 p_values[l]=(ptrace1["p"][1][l].mean())

_, ax = plt.subplots(figsize=(12, 6))

# uncertainty about the estimates:
ax.plot(
    [n, n],
    az.hdi(model_preds).T,
    lw=6,
    color="#00204C",
    alpha=0.8,
)
# expected probability of success:
ax.plot(
    n,
    model_preds.mean(0),
    "o",
    ms=5,
    color="#FFE945",
    alpha=0.8,
    label="Expected prob.",
)

# actual outcomes:
ax.scatter(
    x=n,
    y=r/n,
    marker="x",
    color="#A69C75",
    alpha=0.8,
    label="Observed outcomes",
)

# # true probabilities:
# x = np.linspace(n.min() - 0.1, n.max() + 0.1,num=21)
ax.plot(
    n,
    p_values,
    lw=2,
    ls="--",
    color="#565C6C",
    alpha=0.8,
    label="True prob.",
)

ax.set_xlabel("Predictor")
ax.set_ylabel("Prob. of succes")
ax.set_title("Out-of-sample Predictions")
ax.legend(fontsize=10, frameon=True, framealpha=0.5);