Hi, I have a simple model whereby I have observations for a set of Bernoulli trials (i.e. 1s and 0s) where the probability of success is different for each trial. For each trial there is a baseline probability of success which I have available as an array, however an unknown constant premium is added to every baseline (in the log odds domain) which I am trying to find the posterior distribution for.
The premium lies between -2 and and 2, so I use a uniform prior. I then create the relationship between prem and pS, the probability of success in the Bernoulli trial. My code is below:
#prior
prem = pm.Uniform('prem',lower=-2,upper=2)
base_LO = np.log((base)/(1-base)) # convert array of bases into log odds
pS_LO = base_LO + prem
pS = np.exp(pS_LO)/(1+np.exp(pS_LO)) # convert back to a probability
#likelihood function
Outcome = pm.Bernoulli('Outcome',p=pS,observed=Outcome_obs)
I think this is correct and appears to be giving me distributions I am expecting. However I want to modify the system to take into account the fact that the array of baseline probabilities has an error attached to each value, with this error being taken from a normal distribution with mean of zero and a known, fixed standard deviation of 0.2.
I don’t know how to reflect this in the above model? Any help is appreciated.
To try to address these sorts of models, the best way in my opinion is to write down the math of the generative process. You have:
- The base probability (and its logodds) logit(p_{base})
- The error \varepsilon.
- The prem.
Your generative process will look like this:
\varepsilon \sim Normal(0, 0.2)\\
prem \sim Uniform(-2, 2)\\
Outcome \sim Bernoulli(expit(\varepsilon + prem + logit(p_{base})))
However, you have to be aware that it will be hard if not impossible to distinguish between the effects of \varepsilon and prem (the parameters might be unidentifiable)
Thanks for your response. I think I’m comfortable with the process, I guess I’m just looking for a way to reflect the added uncertainty in the system and what that means for the range of values prem could take given the observed data.
Say I have 50 data points - so 50 p(base) measured with error, and 50 observed outcomes. I would like to add 50 error terms to my p(base) values taken from the normal distribution. I could then take a new sample of 50 error terms on every proposal loop that forms the chain.
In a MCMC algorithm that I write from scratch I can just put this within a code section that gets looped on every proposal, so I’m wondering how I would implement this in pymc3? Is something like the following correct?
base_LO = np.log((base/(1-base)) # convert array of bases into log odds
errLO = pm.Normal('errLO',mu=0,sd=0.2)
pS_LO = base_LO + prem+ errLO # added error term
pS = np.exp(pS_LO)/(1+np.exp(pS_LO)) # back to a probability to be used in bernoulli
I seem to be getting sensible results, I just want to make sure pymc3 is doing what I think it is doing, as an output chain for errLO is given alongside prem as if I were trying to infer errLO, but of course I’m not as it is a known and fixed distribution no matter what outcomes are observed. So I don’t want the algorithm to propose values for errLO on every loop, I just want it to sample from this known distribution. If the above is incorrect, how would I go about implementing this?