Inference on tiny values

Hi!
I have a very basic inference problem, that has a single prior belief and a set of observed values that i want to infer a posterior from.

The thing here is that the values are extremely small. I have the following model:

obs = np.ones(100)*10**-6
prior_mean = 1e-8
prior_sd = 1e-6

with pm.Model() as model:
    prior = pm.Normal('prior', mu=prior_mean, sd = prior_sd)
    pm.Normal('likelihood', mu=prior, observed=obs)
    trace = pm.sample()
    pm.tracpelot(trace)

However, although the observed values are 100 times bigger than the prior, the posterior looks identical to the prior. This would not have happened if the prior was 0 and the observed values were 100 for example.

How do i deal with this small values issue? I have tried centering the prior without any result

Thank you for helping out!

PPLs don’t do well sampling such small numbers. It’s best to sample from the range 0.01 - 100 and then convert afterwards if required

with pm.Model() as model:
    prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)
    pm.Normal('likelihood', mu=prior, observed=obs)
    trace = pm.sample()
    pm.traceplot(trace)

hi @nkaimcaudle,
Thank you for getting bach to me. I have tried exactly that, but the inference still does not make sense to me. with the same code as above, with this “centering” of the distribution from prior to prior_raw, i get the following result:

what troubles me is that i believe the updated posterior mean should be close to that of the evidence, as it is constant. I believe i may need to scale everything up a lot before scaling it back down

Is there anyone who can help figure out how i would do this inference, and get a posterior distribution close to the evidence value?

Thank you!

The problem is that you assume that the std of the likelihood is 1. (the default value)
You are basically saying that you know exactly how precise your measurements are, and that they are incredibly imprecise compared to the mean, so they add no noticeable amount of information.
By the way, this very simple can be solved exactly, with no sampler involved. See the first row of this table:

2 Likes

Ah good point. I am a bit of a newbie. how Can i avoid this effect of the extremely inpercise likelihood function?

Is there a way for me to sample this the way i try to?
I am hesitant to USE a analytisk solution AS i may have different distributions in the future

Bjørn

Hi!
I have now tried to set it up in a different way.

i have the following setup:

with pm.Model() as model:
    prior_std_raw = pm.Normal("unscaled_std", mu=0, sd=1)
    prior_std = pm.Deterministic("std", prior_std_raw  + std)
    prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_std)
    pm.Normal('likelihood', mu=prior, sd = prior_std, observed=obs)
    trace = pm.sample()
    pm.traceplot(trace)

My main concern here is that both the prior and the likelihood use the prior_std RV, is this unadviced? should I have two seperate std-RVs?

Also, I now have a std of the std-RV of the default 1, is this going to cause problems?

Thank you for helping out!

I don’t think prior_std = pm.Deterministic("std", prior_std_raw + std) is right; you’ll end up with values much bigger than 1e-6 for the SD. I think I’d set it up something like this:

obs = np.random.randn(100) * 10**-6
with pm.Model() as model:
    prior_mean = pm.Normal('mu', 1)
    prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
    prior_sd = pm.Normal('sigma', 1) 
    prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
    prior = pm.Normal('prior', prior_mean, prior_sd)
    likelihood = pm.Normal('likelihood', prior, observed=obs)

It’s set up in a Bayesian way with priors for the parameters as well as for the model itself; I’m not sure if that’s what you were trying to do in the last example.

Thank you for getting back to me!
I tried running your example, and there was a lot of divergencies. I also believe the case of having a likelihood std of 1, which @aseyboldt identified as the main problem, is still present in this solution.

Ah, sorry, I copied a line across incorrectly - it was meant to be prior = pm.Normal('prior', prior_mean_scaled, prior_sd_scaled). However, I corrected a scaling error after I initially ran it, and trying it again now I’m getting chain failures.
I’ve incorporated @nkaimcaudle’s suggestion, and it looks like it’s working properly now

obs = np.random.randn(100) * 10**-6
with pm.Model() as model:
    prior_mean = pm.Normal('mu', 1)
    prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
    prior_sd = pm.Normal('sigma', 1) 
    prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
    prior = pm.Deterministic('prior', prior_mean_scaled + prior_sd_scaled)
    likelihood = pm.Normal('likelihood', prior, observed=obs)

awesome! Last question: What about the sd of the likelihood function still being equal to 1?

Should this be adressed somehow?

1 Like

Hmm, good question. Might depend on what you want to do.

I don’t think it matters for pm.sample because it’s taking the observations as input, and doesn’t appear in the trace.

If you want to use pm.sample_posterior_predictive then it looks like it might. Taking the mean of the posterior['likelihood'] samples gives ~0.0088, which is way too high. As you noted earlier, prior_sd_scaled is already used, so it might be best to define an extra prior. I just experimented a bit with a large SD and it looks like the posterior is updated for the likelihood prior even if it only appears in the likelihood (I wasn’t sure if that worked, seeing as the initial samples are from the observations).

with pm.Model() as model:
    prior_mean = pm.Normal('mu', 1)
    prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
    prior_sd = pm.Normal('sigma', 1)
    prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
    prior = pm.Deterministic('prior', prior_mean_scaled + prior_sd_scaled)
    likely_prior = pm.Normal('likely_sd', 1)
    likely_prior_scaled = pm.Deterministic('likely_sd_scaled', likely_prior * 10**-3)
    likelihood = pm.Normal('likelihood', prior, likely_prior_scaled, observed=obs)
    trace = pm.sample(200)
    pos = pm.sample_posterior_predictive(trace, 200)

And we get a mean of ~2.72 e-07, which looks much better

Thank you so much for taking hte time to help me out here, truly very valuable!!!

There are still two things i dont feel like i understand here:
Firstly, why on earth doesnt this the suggestion by @nkaimcaudle work?

with pm.Model() as model:
    prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)
    pm.Normal('likelihood', mu=prior, observed=obs)
    trace = pm.sample()
    pm.traceplot(trace)

This seems to me like the exact same model just less freedom due to less hierarchic construction of the model, however, the sd and mean of the posteiror(former prior) is not affected noticeably by the sampling

Im still struggling to see exactly how the observed argument changes the distribution to a likelihood. What does the RV you named likely_prior do? it is a SD argument, but saying what?

Thank you so much for helping out!

Have you tried something like:

with pm.Model() as model:
    prior_raw = pm.Normal('prior_raw', mu=0, sigma = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)

    sigma = pm.HalfNormal('sigma', 0.1)
    
    pm.Normal('likelihood', mu=prior, sigma=sigma, observed=obs)

That way, the likelihood’s standard deviation should be inferred by MCMC :thinking:

Also, in @Elenchus last suggestion, I think that likely_prior = pm.Normal('likely_sd', 1) should actually be likely_prior = pm.HalfNormal('likely_sd', 1), otherwise you can end up with negative values for a standard deviation parameter.

2 Likes

No worries, I feel like I’m learning a lot too. These are good questions.

Good catch, thanks @AlexAndorra - all the standard deviation priors should be half normal, possibly with a mean of 0 instead of 1

I thought that with @nkaimcaudle’s suggestion the problem was what @aseyboldt mentioned, with the relative size of the standard deviation in the prior - my model has a small SD in the prior. Looking at @AlexAndorra’s suggestion though, the SD is in the likelihood and not the prior. What’s the difference in how these affect the model, Alex? I gather it’s needed in the likelihood for posterior predictive checks, but shouldn’t it be specified in the prior as well?

For your question about the observed argument, a likelihood is P(E|H), or the evidence given the prior, hence incorporating the observations with the prior.

Thank you for answering @AlexAndorra. I have, and the chains seem to fail.

I seem to be somewhat hesitant to accept this formulation of the prior:

prior = pm.Deterministic(‘prior’, prior_mean_scaled + prior_sd_scaled)

If the prior_sd_scaled has a mean of 1e-6, adding this to the prior_mean_scale will shift the mean to 1e-6+1e-8, which is no longer the desired prior mean of 1e-8. Also the SD of the two summed distributions will no longer be what was the initial prior…

The way I proposed adds a prior on the sigma in the Normal likelihood, so this parameter will be updated from data during sampling.
Otherwise, I seem to remember that just specifying pm.Normal('likelihood', mu=prior, observed=obs) will set sigma=1 by default in PyMC3, which would mean the sigma in the Normal likelihood is assumed to be equal to 1 :thinking: