ValueError: p < 0, p > 1 or p contains NaNs error introducing hierarchical effects for Binomial regression

Hi all,

I very consistently have an issue trying to introduce random effects into models like binomial regression. My model samples fine without the introduction of the random effect, but when I try to introduce a zero sum normal random effect for an indicator variable, as follows:

preseason_idx, preseason_indicators= pd.factorize(pandas_data['preseason_only'])

coords = {'kicker' : kickers, 'preseason_dummy': preseason_indicators}

with pm.Model(coords = coords) as preseason_test_model:
    beta_params = pm.find_constrained_prior(
        pm.Beta,
        lower=0.7,
        upper= 1,
        mass = .9,
        init_guess={"alpha": 10, "beta": 3},
    )

    p = pm.Beta('p',**beta_params,dims='kicker')
    mu_preseason  = pm.ZeroSumNormal('mu_preseason', sigma = 1, dims = 'preseason_dummy')
    p_adj = pm.Deterministic('p_adj', p[kickers_idx] + pm.math.invlogit(mu_preseason[preseason_idx]))
    y = pm.Binomial('y', n = attempts, p = p_adj, observed = makes, dims = 'kicker')

with preseason_test_model:
    preseason_test_model_trace = pm.sample_prior_predictive(1000)

I get the following error:

ValueError: p < 0, p > 1 or p contains NaNs. Apply node that caused the error: binomial_rv ....

I understand what is happening (or think I do!), i.e. that the inverse logit of the zero sum normal will push ‘p_adj’ outside of the unit interval. Obviously this beta distribution is concentrated close to 1 which could influence that, but I’m mostly following examples like this notebook by Chris Fonnesbeck about hierarchial models wherein there are several variables fit like this. I have tried different iterations of logit and inverse logit, and tried this with several models in the past and always get this error. Any help would be greatly appreciated!

Hi, I think you are already on the right track. If you add two numbers that are on the 0-1 interval, there is no guarantee that the result will be between 0 and 1.

One strategy for overcoming this is to handle all your additive effects in an unconstrained space, add them up and then apply the inverse logit transformation. In Chris’ notebook, he takes a similar tactic random effects. Consider this example from notebook you linked:

with pm.Model(coords=coords) as position_means_model:

    # Group means
    m_mu = pm.Normal('m_mu', mu=-2, sigma=1)
    s_mu = pm.HalfNormal('s_mu', 1)
    mu = pm.Normal('mu', mu=m_mu, sigma=s_mu, dims='position') # this is a normal rather than a beta

    # Individual random effects
    sigma = pm.HalfNormal('sigma', 1)
    epsilon = pm.Normal('epsilon', mu=0, sigma=sigma, dims='batter')

    p = pm.Deterministic('p', pm.math.invlogit(mu[position_idx] + epsilon[batter_idx]), dims='obs') # the invlogit wraps all the effects together!

    y = pm.Binomial('y', n=pa, p=p, observed=hr)

The slightly annoying bit is that you have to think about your priors in what might be an unintuitive space. You would have to drop the beta distribution and find a normal distribution that behaves much like your beta once the inverse logit is applied.