Hi!
I have a very basic inference problem, that has a single prior belief and a set of observed values that i want to infer a posterior from.
The thing here is that the values are extremely small. I have the following model:
obs = np.ones(100)*10**-6
prior_mean = 1e-8
prior_sd = 1e-6
with pm.Model() as model:
prior = pm.Normal('prior', mu=prior_mean, sd = prior_sd)
pm.Normal('likelihood', mu=prior, observed=obs)
trace = pm.sample()
pm.tracpelot(trace)
However, although the observed values are 100 times bigger than the prior, the posterior looks identical to the prior. This would not have happened if the prior was 0 and the observed values were 100 for example.
How do i deal with this small values issue? I have tried centering the prior without any result
Thank you for helping out!
PPLs donât do well sampling such small numbers. Itâs best to sample from the range 0.01 - 100 and then convert afterwards if required
with pm.Model() as model:
prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)
pm.Normal('likelihood', mu=prior, observed=obs)
trace = pm.sample()
pm.traceplot(trace)
hi @nkaimcaudle,
Thank you for getting bach to me. I have tried exactly that, but the inference still does not make sense to me. with the same code as above, with this âcenteringâ of the distribution from prior to prior_raw, i get the following result:
what troubles me is that i believe the updated posterior mean should be close to that of the evidence, as it is constant. I believe i may need to scale everything up a lot before scaling it back down
Is there anyone who can help figure out how i would do this inference, and get a posterior distribution close to the evidence value?
Thank you!
The problem is that you assume that the std of the likelihood is 1. (the default value)
You are basically saying that you know exactly how precise your measurements are, and that they are incredibly imprecise compared to the mean, so they add no noticeable amount of information.
By the way, this very simple can be solved exactly, with no sampler involved. See the first row of this table:
2 Likes
Ah good point. I am a bit of a newbie. how Can i avoid this effect of the extremely inpercise likelihood function?
Is there a way for me to sample this the way i try to?
I am hesitant to USE a analytisk solution AS i may have different distributions in the future
Bjørn
Hi!
I have now tried to set it up in a different way.
i have the following setup:
with pm.Model() as model:
prior_std_raw = pm.Normal("unscaled_std", mu=0, sd=1)
prior_std = pm.Deterministic("std", prior_std_raw + std)
prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_std)
pm.Normal('likelihood', mu=prior, sd = prior_std, observed=obs)
trace = pm.sample()
pm.traceplot(trace)
My main concern here is that both the prior and the likelihood use the prior_std RV, is this unadviced? should I have two seperate std-RVs?
Also, I now have a std of the std-RV of the default 1, is this going to cause problems?
Thank you for helping out!
I donât think prior_std = pm.Deterministic("std", prior_std_raw + std)
is right; youâll end up with values much bigger than 1e-6
for the SD. I think Iâd set it up something like this:
obs = np.random.randn(100) * 10**-6
with pm.Model() as model:
prior_mean = pm.Normal('mu', 1)
prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
prior_sd = pm.Normal('sigma', 1)
prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
prior = pm.Normal('prior', prior_mean, prior_sd)
likelihood = pm.Normal('likelihood', prior, observed=obs)
Itâs set up in a Bayesian way with priors for the parameters as well as for the model itself; Iâm not sure if thatâs what you were trying to do in the last example.
Thank you for getting back to me!
I tried running your example, and there was a lot of divergencies. I also believe the case of having a likelihood std of 1, which @aseyboldt identified as the main problem, is still present in this solution.
Ah, sorry, I copied a line across incorrectly - it was meant to be prior = pm.Normal('prior', prior_mean_scaled, prior_sd_scaled)
. However, I corrected a scaling error after I initially ran it, and trying it again now Iâm getting chain failures.
Iâve incorporated @nkaimcaudleâs suggestion, and it looks like itâs working properly now
obs = np.random.randn(100) * 10**-6
with pm.Model() as model:
prior_mean = pm.Normal('mu', 1)
prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
prior_sd = pm.Normal('sigma', 1)
prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
prior = pm.Deterministic('prior', prior_mean_scaled + prior_sd_scaled)
likelihood = pm.Normal('likelihood', prior, observed=obs)
awesome! Last question: What about the sd of the likelihood function still being equal to 1?
Should this be adressed somehow?
1 Like
Hmm, good question. Might depend on what you want to do.
I donât think it matters for pm.sample because itâs taking the observations as input, and doesnât appear in the trace.
If you want to use pm.sample_posterior_predictive then it looks like it might. Taking the mean of the posterior['likelihood']
samples gives ~0.0088, which is way too high. As you noted earlier, prior_sd_scaled is already used, so it might be best to define an extra prior. I just experimented a bit with a large SD and it looks like the posterior is updated for the likelihood prior even if it only appears in the likelihood (I wasnât sure if that worked, seeing as the initial samples are from the observations).
with pm.Model() as model:
prior_mean = pm.Normal('mu', 1)
prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
prior_sd = pm.Normal('sigma', 1)
prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
prior = pm.Deterministic('prior', prior_mean_scaled + prior_sd_scaled)
likely_prior = pm.Normal('likely_sd', 1)
likely_prior_scaled = pm.Deterministic('likely_sd_scaled', likely_prior * 10**-3)
likelihood = pm.Normal('likelihood', prior, likely_prior_scaled, observed=obs)
trace = pm.sample(200)
pos = pm.sample_posterior_predictive(trace, 200)
And we get a mean of ~2.72 e-07, which looks much better
Thank you so much for taking hte time to help me out here, truly very valuable!!!
There are still two things i dont feel like i understand here:
Firstly, why on earth doesnt this the suggestion by @nkaimcaudle work?
with pm.Model() as model:
prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)
pm.Normal('likelihood', mu=prior, observed=obs)
trace = pm.sample()
pm.traceplot(trace)
This seems to me like the exact same model just less freedom due to less hierarchic construction of the model, however, the sd and mean of the posteiror(former prior) is not affected noticeably by the sampling
Im still struggling to see exactly how the observed argument changes the distribution to a likelihood. What does the RV you named likely_prior do? it is a SD argument, but saying what?
Thank you so much for helping out!
Have you tried something like:
with pm.Model() as model:
prior_raw = pm.Normal('prior_raw', mu=0, sigma = 1)
prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)
sigma = pm.HalfNormal('sigma', 0.1)
pm.Normal('likelihood', mu=prior, sigma=sigma, observed=obs)
That way, the likelihoodâs standard deviation should be inferred by MCMC 
Also, in @Elenchus last suggestion, I think that likely_prior = pm.Normal('likely_sd', 1)
should actually be likely_prior = pm.HalfNormal('likely_sd', 1)
, otherwise you can end up with negative values for a standard deviation parameter.
2 Likes
No worries, I feel like Iâm learning a lot too. These are good questions.
Good catch, thanks @AlexAndorra - all the standard deviation priors should be half normal, possibly with a mean of 0 instead of 1
I thought that with @nkaimcaudleâs suggestion the problem was what @aseyboldt mentioned, with the relative size of the standard deviation in the prior - my model has a small SD in the prior. Looking at @AlexAndorraâs suggestion though, the SD is in the likelihood and not the prior. Whatâs the difference in how these affect the model, Alex? I gather itâs needed in the likelihood for posterior predictive checks, but shouldnât it be specified in the prior as well?
For your question about the observed argument, a likelihood is P(E|H), or the evidence given the prior, hence incorporating the observations with the prior.
Thank you for answering @AlexAndorra. I have, and the chains seem to fail.
I seem to be somewhat hesitant to accept this formulation of the prior:
prior = pm.Deterministic(âpriorâ, prior_mean_scaled + prior_sd_scaled)
If the prior_sd_scaled has a mean of 1e-6, adding this to the prior_mean_scale will shift the mean to 1e-6+1e-8, which is no longer the desired prior mean of 1e-8. Also the SD of the two summed distributions will no longer be what was the initial priorâŚ
The way I proposed adds a prior on the sigma in the Normal likelihood, so this parameter will be updated from data during sampling.
Otherwise, I seem to remember that just specifying pm.Normal('likelihood', mu=prior, observed=obs)
will set sigma=1
by default in PyMC3, which would mean the sigma in the Normal likelihood is assumed to be equal to 1 