Biased (?) results from Poisson regression model

Hello,
I am using PyMC3 to perform Poisson regression and I am getting results that don’t make sense to me. I am trying to model counts of the number of a certain type of “adverse health events” (exacerbations of a disease) based on a number of other covariates. Some of these are factors, while others are continuous. There is also an offset/exposure for the length of observation time for each subject.

The issue is that after fitting the model (using NUTS), which shows no major diagnostic issues, such as divergence or large Rhat values, my posterior predictive samples produce counts that are roughly half of the observed counts. That being said, I did have to pump of the target_accept to 0.99 to get the model to fit without numerical errors.

Here is the model:

with pm.Model() as model:
    model_intercept = pm.Normal('model_intercept', mu=0, sigma=1000)
    factor_coefs = pm.Normal('factor_coefs', mu=0, sigma=1000, shape=len(dummies.columns))
    covariate_coefs = pm.Normal('covariate_coefs', mu=0, sigma=1000, shape=len(covariates.columns))

    mu = pm.Deterministic(
        'mu',
        model_intercept + 
        pm.math.dot(dummies, factor_coefs) + 
        pm.math.dot(covariates, covariate_coefs) +
        np.log(exposure)
    )

    err = pm.HalfCauchy('err', beta=10)
    log_rate = pm.Normal('log_rate', mu=mu, sigma=err, shape=n_obs))
    numex = pm.Poisson('numex', mu=pm.math.exp(log_rate), observed=numex_obs)

I then fit the model with:

with model:
    trace = pm.sample(2000, tune=2000, init='advi', target_accept=0.99)
    ppc = pm.sample_posterior_predictive(trace)

Here is the plot showing the actual observed counts (x-axis) vs the posterior predictive counts (y-axis; averaged over the posterior samples):
download
As you can probably see, the trend appears to be roughly correct (and the model coefficients do make sense to a large extent), but counts seem to be biased to be too low after fitting to large extent. I have also tried using a negative binomial distribution for the counts, but the result is the essentially the same.

Any suggestions on the model, or reasons why this is happening would be greatly appreciated. Thanks!

Hi Andrew,

I think the problem comes from your priors – more specifically, a sigma of 1000 is usually way too flat for Poisson models (too be honest, I’m even surprised NUTS manages to start sampling, that’s impressive :wink: ).

Indeed, because of the exponential link, values in the linear model become very big, very fast, so you need much more regularizing priors. If you do some prior predictive checks you’ll see this phenomenon very clearly.

For the same reason, I would use an exponential prior rather than half Cauchy for your err parameter.
Hope this helps :vulcan_salute:

Thanks for the suggestion! That makes a lot of sense. I updated to the coefficient priors to use sigmas of 10 and replaced the HalfCauchy with an Exponential (lambda=0.5). The model actually exhibited divergences with these parameters but they disappeared when I pushed target_accept even further to 0.999. Interestingly the result is essentially the same.

This makes me think that the issue isn’t so much with the parameterization but something related to the data itself. Additional suggestions still very welcome…

A sigma of 10 can still be a lot on the exponential scale, especially for slopes. I think the best solution is to do prior predictive checks, to choose the priors most appropriate to your use case.

Here is a NB I wrote on how to do that for multinomial-softmax regression, but you should be able to understand the concept and adapt it.

Also, did you standardize your predictors? This often helps sampling – and even the results interpretation.

Hi @AlexAndorra , Sorry for the slow reply on this. Thank you for the link to your notebook. I did some prior predictive sampling and, as you mentioned, it revealed that insane values were being sampled with sigmas even at a modest size (e.g. 10). After using more regularizing priors (e.g. sigma=1) I found that switching the model error term to use an InverseGamma prior with alpha=5 and beta=5, worked much better from a computational perspective. It sped the sampling way up and eliminated even the occasional divergence during sampling.

Even though the inverse gamma distribution is the conjugate prior for the scale parameter of a normal distribution, I wouldn’t have expected that to be particularly important for MCMC sampling purposes.

Finally, instead of using a single error term for the all of the log_rate's, I replicated the prior for each observation. Originally I had thought that the scalar error term would be broadcasted up to the shape of mu, but it looks like it fits a single error to all of the log_rate's. This makes sense now that I realized it, but I’m still not sure I did the right thing here… I’m worried that I’m confusing the error associated with the log_rate parameter and the residual of the fit, but when the posterior predictive check looks a lot better when fitting the log_rate for each observation using its own sigma. Any insight on this would be helpful.

The updated model:

with pm.Model() as model:
    model_intercept = pm.Normal('model_intercept', mu=0, sigma=1)
    factor_coefs = pm.Normal('factor_coefs', mu=0, sigma=1, shape=len(dummies.columns))
    covariate_coefs = pm.Normal('covariate_coefs', mu=0, sigma=1, shape=len(covariates.columns))

    mu = pm.Deterministic(
        'mu',
        model_intercept + 
        pm.math.dot(dummies, factor_coefs) + 
        pm.math.dot(covariates, covariate_coefs) +
        np.log(exposure)
    )

    err = pm.InverseGamma('err', alpha=5, beta=5, shape=n_obs)
    log_rate = pm.Normal('log_rate', mu=mu, sigma=err, shape=n_obs))
    numex = pm.Poisson('numex', mu=pm.math.exp(log_rate), observed=numex_obs)
1 Like