Why would likelihood be -inf in pm.sample() but not in pm.sample_prior_predictive()?

I have a model that I am now regretting trying too many new things at once.

I am trying to estimate parameters that capture the influence of two independent variables while also getting the model to weight the likely trade off between them over another index variable.

I also have observations that can only be trusted (and I only care about matching) at an aggregate level so I have implimented a scan version of pandas df.groupby(‘id’).sum() over all columns. Then attempting a multinomial likelihood. code below.

When I call pm.sample() My model gives the error “samplingerror: initial evaluation of model at starting point failed!” and says that the “count” variable is evaluated as -inf

I usually chase these errors down using prior_predictive but that is working perfectly. Does anyone have any pointers or know why it works fine (and there is no -inf or nulls in “count”) when I run sample_prior_predictive??
I checked this via “idata.prior_predictive.counts.isnull().max()” and “np.isinf(idata.prior_predictive.counts).max()” and they returns False

def bc_on_col(j, matrix,IDS):
   return at.extra_ops.bincount(x=IDS,weights=matrix[:,j])


with pm.Model(coords=coords) as distribution_calibration:
   a = pm.Gamma("a", alpha=6., beta=2.)
   b = pm.Gamma("b", alpha=1.5, beta=4.5)
   F = pm.Gamma("F", alpha=a, beta = b, dims="area_NAME")
   Pr_D_dis = pm.Deterministic("Pr_D_dis",at.exp(pm.logp(F,hts_destination_gc_dollars.values)),dims=("tid","area_NAME"))
   #a destcat_based factor of the importance of attraction power <1 makes differences less pronouned >1 more
   
   ap_importance = pm.Lognormal("ap_importance", mu=0, sigma=0.2, dims='DESTCAT')
   
   #attraction power, scaled by importance 
   Pr_D_ap = at.pow(total_ap.values,ap_importance[destcad_ids].dimshuffle(0,'x'))
   #re-normalise
   Pr_D_ap = Pr_D_ap / at.sum(Pr_D_ap, axis=1, keepdims=True)
   
   Pr_D = Pr_D_ap * Pr_D_dis
   Pr_D = pm.Deterministic("Pr_D",Pr_D/at.sum(Pr_D, axis=-1, keepdims=True),dims=("tid","area_NAME"))
   
   rolled_up_O, _ = aesara.scan(fn = bc_on_col, sequences=[at.arange(Pr_D.shape[1])], non_sequences=[Pr_D * tid_weights,estOcat_ids])
   
   rolled_up_O = rolled_up_O / at.sum(rolled_up_O,axis=1,keepdims=True)
   rolled_up_O2 = pm.Deterministic("rolled_up_O2",rolled_up_O.T,dims=("estOcat","area_NAME"))
   counts = pm.Multinomial(
       "counts", n=obs_n, p=rolled_up_O2,dims=("estOcat","area_NAME"),observed = obs_counts.values
   )
   idata = pm.sample_prior_predictive(samples=200, random_seed=rng)
   #idata = pm.sample()
1 Like

Welcome!

The message you are seeing when calling pm.sample() likely suggests that you have data in obs_counts.values that is impossible according to your model (at the starting point), not that there is missing data or something intrinsically wrong with the data itself (although things like missingness can be “impossible” according to models sometimes). That’s why sampling from the prior is ok (the observed data is ignored at that point).

Hey Cluhmann, Thanks for the response. I have checked through the array used for observed and I can’t see anything wrong with it so I am still struggling with it. Even an old Poisson model that used to work for me now is giving the same error. I just dont understand if prior predictive can evaluate a reasonably sensible value for the likelihood that is capable of being compared to the observed then why is tuning step getting -inf before I even start sampling? Is there something else I should check in my observed data?

Just read your responses here https://discourse.pymc.io/t/unexpected-initial-evaluation-results/9763
The init=‘advi’ atleast gets it running then hopefully I can see whats wrong with it in the trace. Thanks for this response too!

1 Like