in image 1 as my observed value are always greater than zero still i took likelihood as normal distribution
in image 2 i changed my likelihood to exponential distribution and i also tried with gamma also
now my question is as in image 1 , i have extra predictions which are useless but when i see image 2 than i am not able to cover the observed distribution line
for image 2’s model i tried by changing many different params and values of prior and also distribution of the same but still not able to cover whole graph
so which is correct and which is not ?
what should i use
There are not “correct” or “incorrect” models. There are models you are happy with and one you are unhappy with. My suggestion would be to do some prior predictive checking (not posterior predictive checking). That will help you figure out if your model is behaving in ways that make sense to you.
I can, but I would need to see what the prior predictive looks like and even then, you would be the best person to evaluate whether the model is behaving in the way you wish.
I took a look at your model. It’s hard to tell what your graph is actually showing though. Could you paste your code for the second visualization? If it’s just showing a distribution plot of all Y_obs, I’m not sure how meaningful it is for figuring out what’s going on. I will say, though, your data seem to contain a lot of zeros and an exponential distribution probably won’t work for modeling counts with excess zeros (or any count data, for that matter).
To answer your question… like @aseyboldt said in your original post, I would recommend using a Negative-Binomial distribution. It’s like the Poisson distribution for count data, except allows you to handle overdispersion.
If you’re normalizing the data, then you definitely shouldn’t use an exponential distribution. The exponential distribution cannot model negative numbers. Consider leaving the counts as they are and doing something like this: GLM: Negative Binomial Regression — PyMC example gallery
what if we are using gamma instead of negative binomial
as for different regions have different populations so we have to find per 100k and that is our normalization
I think this point has been made several times along the thread, but you really need to look at the prior predictive to know if your model is behaving as expected. If it does not generate the data you expect to see, then there is no way for your posterior to be any different. You seem to be unhappy with the high probability of zero in your PPC. However, as this is completely expected with an exponential. You can certainly normalize to per 100k and then use a different distribution, including gamma. However, again, depending on your priors for alpha/beta, you may still have a model with high probability near zero. It seems that in the model link you posted above, you are defining alpha=1. This is likely inconsistent with that you expect from your data. Are you expecting something that looks more like a normal distribution centered around some positive value? Take a look at the visualization and see what you think: https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.Gamma.html
Do you want a fixed value of alpha or would you consider placing priors on both alpha/beta? You can also parameterize the gamma using mu/sigma if that makes more sense for your application.