Sampling error: Initial evaluation of model at starting point failed!

Hi everyone,

I’m quite new using PyMC3 and currently trying to implement the following model of predictive election. However, I’m receiving this error from
pymc3.exceptions.SamplingError: Initial evaluation of model at starting point failed!

Do you have a tip on this? Below is the following code with the dataset.

# Explore and Sample the Parameter Space
import pandas as pd       
import numpy as np
import pymc3 as pm
from pymc3.math import invlogit
import theano.tensor as tt
import xarray as xr
import arviz as az
    

PRIOR_N = 50
# results used as prior
b_pct = .25
l_pct = .35

# normalize the split because
b_pct_norm = b_pct / (b_pct + l_pct)
l_pct_norm = l_pct / (b_pct + l_pct)

b_pct_norm, l_pct_norm

alpha = int(l_pct_norm * PRIOR_N)
beta = PRIOR_N - alpha

alpha, beta


data = [['A', 243.945, 30.896], 
        ['B', 381.126, 186.751], 
        ['C', 596.776, 301.596],
        ['D', 880.126, 449.231],
        ['E', 477.420, 238.710],
        ['F', 425.333, 232.001],
        ['G', 418.260, 215.598]]

# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['pollster', 'samplesize', 'num_votes'])

print(df)

with pm.Model() as model:
    
    phi = pm.Beta('phi', alpha=alpha, beta=beta)
    
    kappa_log = pm.HalfNormal('kappa_log', sigma=10)
    kappa = pm.Deterministic('kappa', tt.exp(kappa_log))
    
    thetas = pm.Beta(
        'thetas', 
        alpha=phi*kappa, 
        beta=(1.0-phi)*kappa, 
        shape=len(df)
    )
    
    y = pm.Binomial(
        'y', 
        n=df['samplesize'], 
        p=thetas, 
        observed=df['num_votes']
    )
    
    
with model:
     poll_samples = pm.sample(5000, tune=5000, cores=2, target_accept=0.98)

One big hint about why this error occurs in in this message:

Initial evaluation results:
phi_logodds__        0.33
kappa_log_log__     -0.77
thetas_logodds__    16.55
y                    -inf
Name: Log-probability of test_point, dtype: float64

This suggests that the critical failure is in the y parameter. Looking at the binomial specification for this parameter, it seems that you are suggesting that pollster ‘A’ obs=308964 “successes” in n=243.945559 attempts. Given that this doesn’t make much (at least it doesn’t to me), it seems reasonable that the sampler gave up.

Thanks for noticing. It lacked a dot. But even then the error continues.

I’m not sure what dot you’re referring to. For binomial parameters, n must be greater than or equal to the observed number of “successes”. For pollster A, you had 308964 successes and ~244 attempts.

Yes, but how about 30.8964 successes in ~244 attempts? That should work, but don’t! That is why I’m trying to figure out.

I took some random guesses about where the decimal points were supposed to be and it seems to work fine for me. What is your data look like now that’s it’s correct?