Truncated normal distribution with optimal truncation thresholds via DensityDist

Hello,

I am a newcomer to PyMC3, and would be grateful if I could get some guidance on how to model truncated normal distribution with optimal truncation thresholds.

Toy data was generated as following:

  • Generate normal distribution (sample).
  • Truncate the distribution, i.e. sample[(sample>a) & (sample<b)]
  • Add few points beyond the upper threshold b, from the original sample (it simulates a real business problem, since we don’t know the real truncation thresholds, i.e. thresholds a and b are free variables). The same could be done for a, but omitted it for simplicity.
  • The goal is to get posteriors for 1) truncated normal distribution (mu, sigma) and 2) optimal truncation points (a,b).

Minimum reproducible example:

import theano.tensor as tt
import numpy as np
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm

a = 5
b = 16
niter = 4000

sigma = 5.
mu = 13.
size=5000
samples = np.random.normal(mu, sigma, size)

# truncate and add few points beyond the point of truncation
random_high = samples[samples>b][:20]
truncated = samples[(samples > a) & (samples < b)]
truncated = np.hstack([truncated, random_high])

def cdf(x):
    """Cumulative distribution function of standard normal distribution"""
    
    return 0.5*(1.0+tt.erf(x/tt.sqrt(2)))

def pdf(x):
    """Cumulative distribution function of standard normal distribution"""
    
    return 1.0/tt.sqrt(2*np.pi)*tt.exp(-0.5*(x**2))

def logp_tr(x, mu, std,a,b):
    """ Define logp based on https://en.wikipedia.org/wiki/Truncated_normal_distribution"""

    f = pdf((x-mu)/std)/std/((cdf((b-mu)/std) - cdf((a-mu)/std)))
    return tt.switch(np.any(x>b) | np.any(x<a), -1e10, tt.log(f)) 
                    

with pm.Model() as tmodel:
# Just some generic priors
    mu = pm.Normal('mu', mu=25, sd=25)
    std = pm.Uniform('std', lower=0.1, upper=80)
    a=pm.Uniform('low', lower=0, upper =10)
    b=pm.Uniform('high', lower=10, upper =40)

                                    
    x= pm.DensityDist('dist', logp_tr, observed = {'x':truncated, 'mu':mu, 'std':std, 'a':a, 'b':b})
    
    start = pm.find_MAP()
    
    # Sampler
    step = pm.Metropolis()

    # Sampling
    trace = pm.sample(niter, step=step, njobs=4)

pm.traceplot(trace[1000:], varnames=['mu', 'std','low','high']);

Trace that I am getting is:
image

As can be seen, posterior for b is wrong (and as a result posterior of mu is shifted). I suspect I am not defining correctly the logp_tr function for custom distribution. The pdf is based on wikipedia and seems to be correct (based on previous attempts with fixed a and b).
The correct posteriors should have maximums near mu=13, sigma=5, a = 5, b=16 (based on distribution it was generated from).
Any suggestions on how to fix the problem?
Thanks in advance

I think you are on the right track - you are in effect implementing a mixture distribution with a truncated normal as the main component and a noise component.

I would play with the prior a bit, also try addressing the value in

return tt.switch(np.any(x>b) | np.any(x<a), -1e10, tt.log(f))

Not sure if -1e10 is a good scale in this case.

Three things, which I am not sure are helpful, but here goes

  1. This example exposes a bug in sample_prior_predictive, which is easy enough to fix later, so thanks!

  2. Here’s your prior. It is worth noting that the truncated normal distribution is sometimes returning inf, which might have to do with extreme values or bad priors?

  3. It wouldn’t be so bad to combine my code snippet below and yours to make a proper TruncatedNormal distribution, if anyone’s interested in a pr…

For what it is worth, this is the code I used for generating the random samples:

import scipy.stats as st

def truncated_normal_random(mu, std, a, b):
    def gen_random(point=None, size=None):
        mu_v, std_v, a_v, b_v = pm.distributions.draw_values([mu, std, a, b], point=point, size=size)
        return pm.distributions.generate_samples(st.truncnorm.rvs,
                                                 a=(a_v - mu_v)/std_v, 
                                                 b=(b_v - mu_v) / std_v, 
                                                 loc=mu_v, 
                                                 scale=std_v,
                                                 size=size,
        )
    return gen_random

...

    x= pm.DensityDist('dist', logp_tr, 
                      random=truncated_normal_random(mu, std, a, b), 
                      observed={'x':truncated, 'mu':mu, 'std':std, 'a':a, 'b':b})

1 Like

@junpenglao, @colcarroll - thanks for your input!

@junpenglao - about the -1e10 value, yes you are right, it is arbitrary. Initially I tried with -np.inf (which should be the case by definition since pdf of truncated normal is 0 out of bounds) - but the solution exploded. Any hints on how to choose the scaling?
In fact, do you think it is more appropriate to model this case with mixture distribution?

@colcarroll - in (1) what is the bug you are mentioning? Re (2) and (3) - once it is working, you are right, it is worth making proper TruncatedNormal, thanks for supplying the random function!

Sorry - I did not provide much context! We just merged to master pm.sample_prior_predictive that efficiently samples the prior predictive (but requires that a DensityDist has a .random method). It fails right now on this example for silly reasons, but https://github.com/pymc-devs/pymc3/pull/3045 fixes it.

I used that method to generate 4,000 draws from the prior predictive, and to make the above plot.

It would be interesting to model it as a mixture, but right now I think it is better to fixed the mixture parameter and try to make it work first

As for the scaling, I think you can check it by:

def logp_tr(x, mu, std, a, b, outliner_p):
    """ Define logp based on https://en.wikipedia.org/wiki/Truncated_normal_distribution"""
    f = pdf((x-mu)/std)/std/((cdf((b-mu)/std) - cdf((a-mu)/std)))
    return tt.switch(tt.or_(tt.lt(x, a), tt.gt(x, b)), tt.log(outliner_p), tt.log(f))

x = np.linspace(0, 25, 100)
plt.plot(x, tt.exp(logp_tr(x, mu, sigma, a, b, 1/1000*.01)).eval());

image

Which gives you an idea whether the implementation is correct or not.

I have quite a lot of difficulty sampling this model with NUTS, stronger regularization from the prior is definitively needed.

Thanks, I see. pm.sample_prior_predictive is handy!

@junpenglao - thank you, this is very useful. I am starting to get results that make sense by playing with the scaling parameter ‘outliner_p’. Actually it makes sense that it shouldn’t be -infinity, since by specifying constant outliner_p we are specifying uniformly distributed noise? So this logp describes custom distribution which is truncated normal inside the bounds and noise outside of bounds?

Another question is if I want to refactor TruncatedNorm into proper class (together with @colcarroll’s code) and submit as a PR - do we actually need to decrease this parameter to -inf, or leave it as a choice to user?

Yep, I am setting the noise as a low probability, potentially if you change the scaling here it assign different weight to the mixture.

It would be useful to have a Truncated normal distribution, with either the lower bound, the upper bound, or both specified. I would suggested implementing one without the noise/outliner, as that would be a more general case.