Samples from prior appear to have wrong distribution

ckrapu · October 7, 2018, 3:39am

Hi,

I was creating a logistic regression model and found that the observed 0/1 values didn’t match up with the probabilities which were supplied as parameters to a Bernoulli distribution. The code block below gives a minimal reproducible example:

import numpy as np
import pymc3 as pm
import scipy

N = 100
with pm.Model() as minimal_example:
    X         = pm.Normal('x',shape = N)
    intercept = pm.Normal('intercept')
    #intercept = 0
    
    p = pm.Deterministic('p',pm.math.sigmoid(X + intercept))
    Y = pm.Bernoulli('Y',p = p,shape = N)
    
    samples = pm.sample_prior_predictive(samples=1)

print('Observed +1 values: ',np.sum(samples['Y']))
print('Expected +1 values: ',np.sum(samples['p']))
print('Corresponding binomial CDF:',scipy.stats.binom.cdf(np.sum(samples['Y']),np.sum(samples['p'])*2,0.5))

I ran this and got expected / observed counts of +1 values that were wildly incompatible (with corresponding binomial CDFs of nearly exactly 0 or 1). However, if I replace the intercept term above with the commented out line intercept = 0 then the predicted and observed Y values line up as expected.

TL;DR: logistic regression probabilities and observations are mismatched when sampling from a prior.

junpenglao · October 7, 2018, 8:25am

This is likely related to https://github.com/pymc-devs/pymc3/issues/3210, which we are currently trying to fix.

ckrapu · October 8, 2018, 12:48am

Sure, I’ll keep tabs on that issue page then. Thanks!

Topic		Replies	Views
ValueError: probabilities are not non-negative when trying to sample prior predictive Questions	5	3585	September 24, 2020
New To PyMC3 \| Logistic Regression - Bug Questions bug	4	789	November 4, 2020
GLM logistic regression with custom prior in pymc3 (v. 3.6) Questions	10	4136	February 1, 2019
Seeds: Random effect logistic regression Questions	31	1274	March 29, 2022
Hierarchical bernoulli model with bernoulli priors Questions	1	383	May 1, 2020

Samples from prior appear to have wrong distribution

Related topics