Standard deviation of Likelihood - sample size dependent?

Hi,
I am really hoping someone may be able to help me figure out this simple issue:

I have some data-list (called obs) of length x, with a normal prior on the mean with parameters mu and sd, and a halfnormal prior on the standard deviation of the data, obs_sd. I believe my data may be normally distributed, hence i have a normal likelihood function.
Before specifying the data i then have the following model:

mu = 10
sd = 2
obs_sd = 1

with pm.Model() as model:
    prior_m = pm.Normal('p_m', mu=mu, sd=sd)
    prior_sd = pm.HalfNormal('p_sd', sd=obs_sd)
    
    L = pm.Normal('L', mu=prior_mu, sd=prior_sd, observed = obs)

Here is my issue:
If i have only one observed value in obs i get divergencies with this model, if i have two, or more, it seems to be fine. Also, if i set a known standard deviation on the likelihood, it seems to be fine with one variable.

So my question is: Is the lack of multiple observed values somehow affecting the posterior of the standard deviation of the likelihood? If so, why/how?

Do you have problems with two observations? I would guess the problem is that the prior_sd is unconstrained (other than by the prior) when you have a single observation. You are also trying to estimate two parameters from a single datapoint, which is usually problematic.

Hi Bjorn,
I think @ricardoV94’s answer is spot-on: you’re trying to estimate a standard deviation with one data point, which is impossible, so this is expected that NUTS spits out divergencies.
Hope this helps :vulcan_salute:

Thank you so much for answering @ricardoV94 and @AlexAndorra. It makes a lot of sense, and i am embarrased it wasnt obvious in the first place, heh.

I have problems with two observations depending on my confidence in my prior mean(?). If i set the sd of the prior mean to be 0.01 it seems to be ok, if i set it to be 1 or two, i am having divergencies.

This makes sense too: when you have very few data, the prior’s weight is more important in inference, so you have to use more informative priors if you wanna infer anything. That’s where domain knowledge enters the equation and allows you to choose sensible priors that will help inference, even when there isn’t much data