Hello: I am trying to bound my output to be above 0 in a hierarchical model that relies upon observed samples (which are all above 0).
output = pm.Normal('output' + repfile,mu = (muRVs[j] + moise_mus),sigma= sigmaRVs[j]+noise_sigmass, observed=samples)
I can’t wrap this in a bounded variable because I am using the observed term. I looked at the censored data example but I am still unclear how this is working. I’m new to Pymc3 and Bayesian statistics. I see that observed in the example only looks at the arguments within a certain range. What role is right_censored = pm.Potential(‘right_censored’, censored_right_likelihood(mu,sigma,n_right_censored,high))? Thank you!
The censoring example code is for the situation like: you have a device taking a measurement, but the device has some physical limit - for example, it cannot take measure >10. Which means that in your observation you have values that are 10s but you don’t know whether they are 10 or >10.
What a censoring model does is that, it takes all the observed values that are 10s and compute the integral of the tail. The intuition here is that since the actual value could be 10 and above (but we dont know for sure as the device is capped), we take the expectation of the value >=10 (instead of inputting 10 as the observed). And this expectation is equal to taking the integral of the tail cdf(value>=10 | parameters).
You can find some more detail discussions about censoring model in this post: Truncated Inverse normal distribution (also known as Wald distribution)
Thanks @junpenglao . I know that the observations will always be greater than 0. However, due to the mu, and sigma values from the upper levels of the model passed into the Normal distribution, the output takes on some negative values.
I may be confused, but isn’t this different from the censored values and what is done in the example? Here, only my output needs to be censored, not my observations. Do I need a truncating model vs. a censored model? I think I want to modify the normal distribution such that all output less than 0 have no weight. Would I still need to rewrite the PDF/CDF? Sorry, if I totally missed the point: I appreciate your help!
Since you want something like a bounded variable but with observed, what you are looking for is a truncated distribution. If you just treat the truncated distribution as any other distribution, you want to find out the log-likelihood (so that you can at least wrap it in DensityDist and used it in PyMC3).
Have a look at the wikipedia page of the truncated normal distribution, you would see the truncated distribution is generated by dividing the normal distribution with the normal CDF within the bound so that the distribution integrates to 1. You could do that by adding the -log(\sigma*(-\frac{1}{2}-erf(\frac{lowerbound - \mu}{\sqrt{2}*\sigma}))) to your model using pm.Potential
, which actually gives you something quite similar to the censor_example.py