Inference on tiny values

BjornHartmann · September 9, 2020, 7:26am

Hi!
I have a very basic inference problem, that has a single prior belief and a set of observed values that i want to infer a posterior from.

The thing here is that the values are extremely small. I have the following model:

obs = np.ones(100)*10**-6
prior_mean = 1e-8
prior_sd = 1e-6

with pm.Model() as model:
    prior = pm.Normal('prior', mu=prior_mean, sd = prior_sd)
    pm.Normal('likelihood', mu=prior, observed=obs)
    trace = pm.sample()
    pm.tracpelot(trace)

However, although the observed values are 100 times bigger than the prior, the posterior looks identical to the prior. This would not have happened if the prior was 0 and the observed values were 100 for example.

How do i deal with this small values issue? I have tried centering the prior without any result

Thank you for helping out!

nkaimcaudle · September 9, 2020, 9:38am

PPLs don’t do well sampling such small numbers. It’s best to sample from the range 0.01 - 100 and then convert afterwards if required

with pm.Model() as model:
    prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)
    pm.Normal('likelihood', mu=prior, observed=obs)
    trace = pm.sample()
    pm.traceplot(trace)

BjornHartmann · September 9, 2020, 10:16am

hi @nkaimcaudle,
Thank you for getting bach to me. I have tried exactly that, but the inference still does not make sense to me. with the same code as above, with this “centering” of the distribution from prior to prior_raw, i get the following result:

what troubles me is that i believe the updated posterior mean should be close to that of the evidence, as it is constant. I believe i may need to scale everything up a lot before scaling it back down

BjornHartmann · September 9, 2020, 10:42am

Is there anyone who can help figure out how i would do this inference, and get a posterior distribution close to the evidence value?

Thank you!

aseyboldt · September 9, 2020, 1:44pm

The problem is that you assume that the std of the likelihood is 1. (the default value)
You are basically saying that you know exactly how precise your measurements are, and that they are incredibly imprecise compared to the mean, so they add no noticeable amount of information.
By the way, this very simple can be solved exactly, with no sampler involved. See the first row of this table:

BjornHartmann · September 9, 2020, 2:20pm

Ah good point. I am a bit of a newbie. how Can i avoid this effect of the extremely inpercise likelihood function?

Is there a way for me to sample this the way i try to?
I am hesitant to USE a analytisk solution AS i may have different distributions in the future

Bjørn

BjornHartmann · September 10, 2020, 8:03am

Hi!
I have now tried to set it up in a different way.

i have the following setup:

with pm.Model() as model:
    prior_std_raw = pm.Normal("unscaled_std", mu=0, sd=1)
    prior_std = pm.Deterministic("std", prior_std_raw  + std)
    prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_std)
    pm.Normal('likelihood', mu=prior, sd = prior_std, observed=obs)
    trace = pm.sample()
    pm.traceplot(trace)

My main concern here is that both the prior and the likelihood use the prior_std RV, is this unadviced? should I have two seperate std-RVs?

Also, I now have a std of the std-RV of the default 1, is this going to cause problems?

Thank you for helping out!

Elenchus · September 10, 2020, 9:15am

I don’t think prior_std = pm.Deterministic("std", prior_std_raw + std) is right; you’ll end up with values much bigger than 1e-6 for the SD. I think I’d set it up something like this:

obs = np.random.randn(100) * 10**-6
with pm.Model() as model:
    prior_mean = pm.Normal('mu', 1)
    prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
    prior_sd = pm.Normal('sigma', 1) 
    prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
    prior = pm.Normal('prior', prior_mean, prior_sd)
    likelihood = pm.Normal('likelihood', prior, observed=obs)

It’s set up in a Bayesian way with priors for the parameters as well as for the model itself; I’m not sure if that’s what you were trying to do in the last example.

BjornHartmann · September 10, 2020, 10:54am

Thank you for getting back to me!
I tried running your example, and there was a lot of divergencies. I also believe the case of having a likelihood std of 1, which @aseyboldt identified as the main problem, is still present in this solution.

Elenchus · September 10, 2020, 11:22am

Ah, sorry, I copied a line across incorrectly - it was meant to be prior = pm.Normal('prior', prior_mean_scaled, prior_sd_scaled). However, I corrected a scaling error after I initially ran it, and trying it again now I’m getting chain failures.
I’ve incorporated @nkaimcaudle’s suggestion, and it looks like it’s working properly now

obs = np.random.randn(100) * 10**-6
with pm.Model() as model:
    prior_mean = pm.Normal('mu', 1)
    prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
    prior_sd = pm.Normal('sigma', 1) 
    prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
    prior = pm.Deterministic('prior', prior_mean_scaled + prior_sd_scaled)
    likelihood = pm.Normal('likelihood', prior, observed=obs)

BjornHartmann · September 10, 2020, 12:11pm

awesome! Last question: What about the sd of the likelihood function still being equal to 1?

Should this be adressed somehow?

Elenchus · September 10, 2020, 12:51pm

Hmm, good question. Might depend on what you want to do.

I don’t think it matters for pm.sample because it’s taking the observations as input, and doesn’t appear in the trace.

If you want to use pm.sample_posterior_predictive then it looks like it might. Taking the mean of the posterior['likelihood'] samples gives ~0.0088, which is way too high. As you noted earlier, prior_sd_scaled is already used, so it might be best to define an extra prior. I just experimented a bit with a large SD and it looks like the posterior is updated for the likelihood prior even if it only appears in the likelihood (I wasn’t sure if that worked, seeing as the initial samples are from the observations).

with pm.Model() as model:
    prior_mean = pm.Normal('mu', 1)
    prior_mean_scaled = pm.Deterministic('s_mu', prior_mean * 10**-8)
    prior_sd = pm.Normal('sigma', 1)
    prior_sd_scaled = pm.Deterministic('s_sd', prior_sd * 10**-6)
    prior = pm.Deterministic('prior', prior_mean_scaled + prior_sd_scaled)
    likely_prior = pm.Normal('likely_sd', 1)
    likely_prior_scaled = pm.Deterministic('likely_sd_scaled', likely_prior * 10**-3)
    likelihood = pm.Normal('likelihood', prior, likely_prior_scaled, observed=obs)
    trace = pm.sample(200)
    pos = pm.sample_posterior_predictive(trace, 200)

And we get a mean of ~2.72 e-07, which looks much better

BjornHartmann · September 11, 2020, 7:23am

Thank you so much for taking hte time to help me out here, truly very valuable!!!

There are still two things i dont feel like i understand here:
Firstly, why on earth doesnt this the suggestion by @nkaimcaudle work?

with pm.Model() as model:
    prior_raw = pm.Normal('prior_raw', mu=0, sd = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)
    pm.Normal('likelihood', mu=prior, observed=obs)
    trace = pm.sample()
    pm.traceplot(trace)

This seems to me like the exact same model just less freedom due to less hierarchic construction of the model, however, the sd and mean of the posteiror(former prior) is not affected noticeably by the sampling

Im still struggling to see exactly how the observed argument changes the distribution to a likelihood. What does the RV you named likely_prior do? it is a SD argument, but saying what?

Thank you so much for helping out!

AlexAndorra · September 11, 2020, 7:59am

Have you tried something like:

with pm.Model() as model:
    prior_raw = pm.Normal('prior_raw', mu=0, sigma = 1)
    prior = pm.Deterministic('prior', prior_mean + prior_raw*prior_sd)

    sigma = pm.HalfNormal('sigma', 0.1)
    
    pm.Normal('likelihood', mu=prior, sigma=sigma, observed=obs)

That way, the likelihood’s standard deviation should be inferred by MCMC

Also, in @Elenchus last suggestion, I think that likely_prior = pm.Normal('likely_sd', 1) should actually be likely_prior = pm.HalfNormal('likely_sd', 1), otherwise you can end up with negative values for a standard deviation parameter.

Elenchus · September 11, 2020, 8:26am

No worries, I feel like I’m learning a lot too. These are good questions.

Good catch, thanks @AlexAndorra - all the standard deviation priors should be half normal, possibly with a mean of 0 instead of 1

I thought that with @nkaimcaudle’s suggestion the problem was what @aseyboldt mentioned, with the relative size of the standard deviation in the prior - my model has a small SD in the prior. Looking at @AlexAndorra’s suggestion though, the SD is in the likelihood and not the prior. What’s the difference in how these affect the model, Alex? I gather it’s needed in the likelihood for posterior predictive checks, but shouldn’t it be specified in the prior as well?

For your question about the observed argument, a likelihood is P(E|H), or the evidence given the prior, hence incorporating the observations with the prior.

BjornHartmann · September 11, 2020, 11:00am

Thank you for answering @AlexAndorra. I have, and the chains seem to fail.

I seem to be somewhat hesitant to accept this formulation of the prior:

prior = pm.Deterministic(‘prior’, prior_mean_scaled + prior_sd_scaled)

If the prior_sd_scaled has a mean of 1e-6, adding this to the prior_mean_scale will shift the mean to 1e-6+1e-8, which is no longer the desired prior mean of 1e-8. Also the SD of the two summed distributions will no longer be what was the initial prior…

AlexAndorra · September 14, 2020, 8:53am

The way I proposed adds a prior on the sigma in the Normal likelihood, so this parameter will be updated from data during sampling.
Otherwise, I seem to remember that just specifying pm.Normal('likelihood', mu=prior, observed=obs) will set sigma=1 by default in PyMC3, which would mean the sigma in the Normal likelihood is assumed to be equal to 1

Topic		Replies	Views
Narrow posterior Questions	1	537	September 15, 2020
Understanding effect of small data size on posterior distribution version agnostic	21	906	July 20, 2022
Very new to all of this. Why doesn't this three line toy example work? Questions	3	506	June 25, 2019
Generic way to "normalize" inference model Questions	7	1362	May 19, 2022
Standard deviation of Likelihood - sample size dependent? Questions	4	446	October 15, 2020

Inference on tiny values

Related topics