Issues with Truncated Normal

Hello,

Has anyone had an issue using a truncated normal distribution for the likelihood? I’m trying to do a very simple model estimating my target variable which has been scaled between [0,1]. The model using a normal distribution for the likelihood is below.

coords = {'cann':[1,0]}
with pm.Model(coords = coords) as cannibal_model:
    cannibal = pm.Data('cannibal', cann_idx, mutable = True)
    obs = pm.Data('obs', obs_array, mutable = True)

   #  beta = pm.Normal('beta', mu=0, sigma = .1)
    alpha = pm.Normal('alpha', mu=0, sigma = .05, dims = ['cann'])

    mu = alpha[cannibal]
    sigma = pm.HalfCauchy('sigma', beta=0.1)

    eaches = pm.Normal('predicted_eaches',
                           mu=mu,
                           sigma=sigma,
                        #    lower = 0,
                        #    upper = 1,
                           observed=obs)

    idata = pm.sampling_jax.sample_numpyro_nuts(draws = 1000, tune=2000, target_accept = .95)

This gives me a great trace plot but my ppc is estimating above 1.


image

When I run the following using a truncated normal for the likelihood, as below, the trace looks off but the ppc looks…better.


coords = {'cann':cann}
with pm.Model(coords = coords) as cannibal_model:
    cannibal = pm.Data('cannibal', cann_idx, mutable = True)
    obs = pm.Data('obs', obs_array, mutable = True)

   #  beta = pm.Normal('beta', mu=0, sigma = .1)
    alpha = pm.Normal('alpha', mu=0, sigma = .05, dims = ['cann'])

    mu = alpha[cannibal]
    sigma = pm.HalfCauchy('sigma', beta=0.1)

    eaches = pm.TruncatedNormal('predicted_eaches',
                           mu=mu,
                           sigma=sigma,
                        #    lower = 0,
                           upper = 1,
                           observed=obs)

    idata = pm.sampling_jax.sample_numpyro_nuts(draws = 1000, tune=2000, target_accept = .95)


image

Is this a parameters issue or is this indicative of a truncated normal distro?

How does your data actually look like? The ppc plot with that spike at 1 is very strange unless you have censored data not truncated one

Here is a histogram of my actual data:

image

I’m not sure what you mean but censored data.

You said you scaled your data to be in the 0,1 range, how did you do that?

The difference between censored and truncated data is illustrated here: Bayesian regression with truncated or censored data — PyMC example gallery

1 Like

I used a standard scikit-learn min/max scaler.

Oh this is interesting. I didn’t know this existed. Thank you! I will try these techniques.

Your priors look very strict from your data. You have a mean clearly >= 1, but you specified the prior as a Normal(0, 0.05). Your sigma may also be pretty extreme.

As you can see in the plot the model is not really converging, the chains are arriving at different conclusions.

This seems to be doing better. I changed the model to the following

coords = {'cann':cann}
with pm.Model(coords = coords) as cannibal_model:
   cannibal = pm.Data('cannibal', cann_idx, mutable = True)
   obs = pm.Data('obs', obs_array, mutable = True)

   #  beta = pm.Normal('beta', mu=0, sigma = .1)
   alpha = pm.Normal('alpha', mu=.99, sigma = .07, dims = ['cann'])
   sigma = pm.HalfNormal('sigma', 1)
   y_latent = pm.Normal.dist(mu=alpha[cannibal], sigma = sigma)
    


   eaches = pm.Censored('predicted_eaches',
                           dist=y_latent,
                           lower = 0,
                           upper = 1,
                           observed=obs)

   idata = pm.sampling_jax.sample_numpyro_nuts(draws = 1000, tune=2000, target_accept = .95)

image

image

I think this is a good base to expand out from. Thank you for the help.