Adding a known amount of random error to a model

cwilliam · May 19, 2019, 1:06am

Hi, I have a simple model whereby I have observations for a set of Bernoulli trials (i.e. 1s and 0s) where the probability of success is different for each trial. For each trial there is a baseline probability of success which I have available as an array, however an unknown constant premium is added to every baseline (in the log odds domain) which I am trying to find the posterior distribution for.

The premium lies between -2 and and 2, so I use a uniform prior. I then create the relationship between prem and pS, the probability of success in the Bernoulli trial. My code is below:

#prior
prem = pm.Uniform('prem',lower=-2,upper=2)

base_LO = np.log((base)/(1-base)) # convert array of bases into log odds
pS_LO = base_LO + prem
pS = np.exp(pS_LO)/(1+np.exp(pS_LO)) # convert back to a probability
    
#likelihood function
Outcome = pm.Bernoulli('Outcome',p=pS,observed=Outcome_obs)

I think this is correct and appears to be giving me distributions I am expecting. However I want to modify the system to take into account the fact that the array of baseline probabilities has an error attached to each value, with this error being taken from a normal distribution with mean of zero and a known, fixed standard deviation of 0.2.

I don’t know how to reflect this in the above model? Any help is appreciated.

lucianopaz · May 19, 2019, 5:22pm

To try to address these sorts of models, the best way in my opinion is to write down the math of the generative process. You have:

The base probability (and its logodds) logit(p_{base})
The error \varepsilon.
The prem.

Your generative process will look like this:

\varepsilon \sim Normal(0, 0.2)\\ prem \sim Uniform(-2, 2)\\ Outcome \sim Bernoulli(expit(\varepsilon + prem + logit(p_{base})))

However, you have to be aware that it will be hard if not impossible to distinguish between the effects of \varepsilon and prem (the parameters might be unidentifiable)

cwilliam · May 19, 2019, 10:28pm

Thanks for your response. I think I’m comfortable with the process, I guess I’m just looking for a way to reflect the added uncertainty in the system and what that means for the range of values prem could take given the observed data.

Say I have 50 data points - so 50 p(base) measured with error, and 50 observed outcomes. I would like to add 50 error terms to my p(base) values taken from the normal distribution. I could then take a new sample of 50 error terms on every proposal loop that forms the chain.

In a MCMC algorithm that I write from scratch I can just put this within a code section that gets looped on every proposal, so I’m wondering how I would implement this in pymc3? Is something like the following correct?

base_LO = np.log((base/(1-base)) # convert array of bases into log odds

errLO = pm.Normal('errLO',mu=0,sd=0.2)
pS_LO = base_LO + prem+ errLO # added error term
pS = np.exp(pS_LO)/(1+np.exp(pS_LO)) # back to a probability to be used in bernoulli

I seem to be getting sensible results, I just want to make sure pymc3 is doing what I think it is doing, as an output chain for errLO is given alongside prem as if I were trying to infer errLO, but of course I’m not as it is a known and fixed distribution no matter what outcomes are observed. So I don’t want the algorithm to propose values for errLO on every loop, I just want it to sample from this known distribution. If the above is incorrect, how would I go about implementing this?

Topic		Replies	Views
Estimating probability of data point using inferred posterior Questions	7	732	April 30, 2018
Sample posterior predictive error Questions	3	1189	October 8, 2019
Understanding error while sampling a model with conditional probabilities Questions	7	1997	May 28, 2018
Statistical Rethinking Ch 3 Q1 Code / observed = data errors / beginner question v3 modeling	6	442	June 2, 2022
SamplingError("Bad initial energy") Questions	2	495	May 26, 2020

Adding a known amount of random error to a model

Related topics