Hi there,
I know this is a question on the FAQ, so please point me back there if there is something obvious that I missed.
I have a model that is generating a ParallelSamplingError: Bad initial energy error. Here it is:
notifications_sent = 2250380
r_o = 687795
v_o_and_r_o = 26198
v_o = 248
with pm.Model() as reception_model:
alpha = pm.Beta('alpha', alpha=1, beta=1)
beta = pm.Beta('beta', alpha=1, beta=1)
gamma = pm.Beta('gamma', alpha=1, beta=1)
r = pm.Binomial('received', n=notifications_sent, p=alpha)
r_r = pm.Binomial('received_recorded', n=r, p=gamma, observed=r_o+v_o_and_r_o)
v = pm.Binomial('viewed', n=r, p=beta)
v_r = pm.Binomial('viewed_recorded', n=v, p=gamma, observed=v_o+v_o_and_r_o)
trace = pm.sample(1000, tune=500, chains=2)
The reception_model.check_test_point() has -inf on a few RVs as expected:
alpha_logodds__ -1.39
beta_logodds__ -1.39
gamma_logodds__ -1.39
received -5.23
viewed -4.88
received_recorded -inf
viewed_recorded -inf
Checking reception_model.test_point reveals this:
{'alpha_logodds__': array(0., dtype=float32),
'beta_logodds__': array(0., dtype=float32),
'gamma_logodds__': array(0., dtype=float32),
'received': array(11078, dtype=int16),
'viewed': array(5539, dtype=int16)}
So the logodds for received_recorded and viewed_recorded result in -inf because n for each Binomial is fewer than the observed values. Following a recommendation from a FAQ answer, one approach is to set the testval for received and viewed so that they are at least as large as the observed values. So I updated the model like so:
notifications_sent = 2250380
r_o = 687795
v_o_and_r_o = 26198
v_o = 248
with pm.Model() as reception_model:
alpha = pm.Beta('alpha', alpha=1, beta=1)
beta = pm.Beta('beta', alpha=1, beta=1)
gamma = pm.Beta('gamma', alpha=1, beta=1)
r = pm.Binomial('received', n=notifications_sent, p=alpha, testval=r_o+v_o_and_r_o+1000, dtype='int64')
r_r = pm.Binomial('received_recorded', n=r, p=gamma, observed=r_o+v_o_and_r_o, dtype='int64')
v = pm.Binomial('viewed', n=r, p=beta, testval=v_o+v_o_and_r_o+1000, dtype='int64')
v_r = pm.Binomial('viewed_recorded', n=v, p=gamma, observed=v_o+v_o_and_r_o, dtype='int64')
This makes things worse
. reception_model.check_test_point() returns
alpha_logodds__ -1.39
beta_logodds__ -1.39
gamma_logodds__ -1.39
received -inf
viewed -inf
received_recorded -inf
viewed_recorded -14734.69
the test points are:
{'alpha_logodds__': array(0., dtype=float32),
'beta_logodds__': array(0., dtype=float32),
'gamma_logodds__': array(0., dtype=float32),
'received': array(714993),
'viewed': array(27446)}
So I guess at this point my next step should be to re-think my priors or my model structure. But I’m not sure how to think through what’s going wrong with my current model. I’d appreciate any guidance on nexts steps. Hopefully it’s just something I missed in the FAQ!
Cheers,
Jesse