# Truncated normal distribution with optimal truncation thresholds via DensityDist

Hello,

I am a newcomer to PyMC3, and would be grateful if I could get some guidance on how to model truncated normal distribution with optimal truncation thresholds.

Toy data was generated as following:

• Generate normal distribution (sample).
• Truncate the distribution, i.e. sample[(sample>a) & (sample<b)]
• Add few points beyond the upper threshold b, from the original sample (it simulates a real business problem, since we don’t know the real truncation thresholds, i.e. thresholds a and b are free variables). The same could be done for a, but omitted it for simplicity.
• The goal is to get posteriors for 1) truncated normal distribution (mu, sigma) and 2) optimal truncation points (a,b).

Minimum reproducible example:

``````import theano.tensor as tt
import numpy as np
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm

a = 5
b = 16
niter = 4000

sigma = 5.
mu = 13.
size=5000
samples = np.random.normal(mu, sigma, size)

# truncate and add few points beyond the point of truncation
random_high = samples[samples>b][:20]
truncated = samples[(samples > a) & (samples < b)]
truncated = np.hstack([truncated, random_high])

def cdf(x):
"""Cumulative distribution function of standard normal distribution"""

return 0.5*(1.0+tt.erf(x/tt.sqrt(2)))

def pdf(x):
"""Cumulative distribution function of standard normal distribution"""

return 1.0/tt.sqrt(2*np.pi)*tt.exp(-0.5*(x**2))

def logp_tr(x, mu, std,a,b):
""" Define logp based on https://en.wikipedia.org/wiki/Truncated_normal_distribution"""

f = pdf((x-mu)/std)/std/((cdf((b-mu)/std) - cdf((a-mu)/std)))
return tt.switch(np.any(x>b) | np.any(x<a), -1e10, tt.log(f))

with pm.Model() as tmodel:
# Just some generic priors
mu = pm.Normal('mu', mu=25, sd=25)
std = pm.Uniform('std', lower=0.1, upper=80)
a=pm.Uniform('low', lower=0, upper =10)
b=pm.Uniform('high', lower=10, upper =40)

x= pm.DensityDist('dist', logp_tr, observed = {'x':truncated, 'mu':mu, 'std':std, 'a':a, 'b':b})

start = pm.find_MAP()

# Sampler
step = pm.Metropolis()

# Sampling
trace = pm.sample(niter, step=step, njobs=4)

pm.traceplot(trace[1000:], varnames=['mu', 'std','low','high']);
``````

Trace that I am getting is: As can be seen, posterior for b is wrong (and as a result posterior of mu is shifted). I suspect I am not defining correctly the logp_tr function for custom distribution. The pdf is based on wikipedia and seems to be correct (based on previous attempts with fixed a and b).
The correct posteriors should have maximums near mu=13, sigma=5, a = 5, b=16 (based on distribution it was generated from).
Any suggestions on how to fix the problem?

I think you are on the right track - you are in effect implementing a mixture distribution with a truncated normal as the main component and a noise component.

I would play with the prior a bit, also try addressing the value in

``````return tt.switch(np.any(x>b) | np.any(x<a), -1e10, tt.log(f))
``````

Not sure if `-1e10` is a good scale in this case.

Three things, which I am not sure are helpful, but here goes

1. This example exposes a bug in `sample_prior_predictive`, which is easy enough to fix later, so thanks!

2. Here’s your prior. It is worth noting that the truncated normal distribution is sometimes returning `inf`, which might have to do with extreme values or bad priors?

3. It wouldn’t be so bad to combine my code snippet below and yours to make a proper TruncatedNormal distribution, if anyone’s interested in a pr…

For what it is worth, this is the code I used for generating the random samples:

``````import scipy.stats as st

def truncated_normal_random(mu, std, a, b):
def gen_random(point=None, size=None):
mu_v, std_v, a_v, b_v = pm.distributions.draw_values([mu, std, a, b], point=point, size=size)
return pm.distributions.generate_samples(st.truncnorm.rvs,
a=(a_v - mu_v)/std_v,
b=(b_v - mu_v) / std_v,
loc=mu_v,
scale=std_v,
size=size,
)
return gen_random

...

x= pm.DensityDist('dist', logp_tr,
random=truncated_normal_random(mu, std, a, b),
observed={'x':truncated, 'mu':mu, 'std':std, 'a':a, 'b':b})

``````
1 Like

@junpenglao, @colcarroll - thanks for your input!

@junpenglao - about the -1e10 value, yes you are right, it is arbitrary. Initially I tried with -np.inf (which should be the case by definition since pdf of truncated normal is 0 out of bounds) - but the solution exploded. Any hints on how to choose the scaling?
In fact, do you think it is more appropriate to model this case with mixture distribution?

@colcarroll - in (1) what is the bug you are mentioning? Re (2) and (3) - once it is working, you are right, it is worth making proper TruncatedNormal, thanks for supplying the random function!

Sorry - I did not provide much context! We just merged to master `pm.sample_prior_predictive` that efficiently samples the prior predictive (but requires that a `DensityDist` has a `.random` method). It fails right now on this example for silly reasons, but https://github.com/pymc-devs/pymc3/pull/3045 fixes it.

I used that method to generate 4,000 draws from the prior predictive, and to make the above plot.

It would be interesting to model it as a mixture, but right now I think it is better to fixed the mixture parameter and try to make it work first

As for the scaling, I think you can check it by:

``````def logp_tr(x, mu, std, a, b, outliner_p):
""" Define logp based on https://en.wikipedia.org/wiki/Truncated_normal_distribution"""
f = pdf((x-mu)/std)/std/((cdf((b-mu)/std) - cdf((a-mu)/std)))
return tt.switch(tt.or_(tt.lt(x, a), tt.gt(x, b)), tt.log(outliner_p), tt.log(f))

x = np.linspace(0, 25, 100)
plt.plot(x, tt.exp(logp_tr(x, mu, sigma, a, b, 1/1000*.01)).eval());
`````` Which gives you an idea whether the implementation is correct or not.

I have quite a lot of difficulty sampling this model with NUTS, stronger regularization from the prior is definitively needed.

Thanks, I see. pm.sample_prior_predictive is handy!

@junpenglao - thank you, this is very useful. I am starting to get results that make sense by playing with the scaling parameter ‘outliner_p’. Actually it makes sense that it shouldn’t be -infinity, since by specifying constant outliner_p we are specifying uniformly distributed noise? So this logp describes custom distribution which is truncated normal inside the bounds and noise outside of bounds?

Another question is if I want to refactor TruncatedNorm into proper class (together with @colcarroll’s code) and submit as a PR - do we actually need to decrease this parameter to -inf, or leave it as a choice to user?

Yep, I am setting the noise as a low probability, potentially if you change the scaling here it assign different weight to the mixture.

It would be useful to have a Truncated normal distribution, with either the lower bound, the upper bound, or both specified. I would suggested implementing one without the noise/outliner, as that would be a more general case.