How do I implement an upper limit log normal distribution


#1

The upper limit log normal (ULLN) distribution is similar to a lognormal distribution but with an extra parameter for the upper limit (ymax). I am having difficulty implementing it in pymc3 (i am new to pymc3). This is my attempt at coding it, but it does not work as expected.

import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm

## generate surrogate data
sigma=1
mu=0
ymax=5
q = np.exp(np.random.randn(500)*sigma+mu)
y = (q*ymax)/(q + ymax)

plt.hist(np.log(y))
plt.xlabel('log surrogate')


## setup model

ullnmodel = pm.Model()
yobsmax = np.amax(y)

with ullnmodel:
    ymax = pm.Pareto('ymax',alpha=1,m=yobsmax)
    
    mu = pm.Normal('mu', mu=0, sd=5)
    sigma = pm.Lognormal('sigma', mu=0, sd=5)

    qt =  pm.math.log(y*ymax) - pm.math.log(ymax  - y)
    q = pm.Normal('q', mu=mu, sd = sigma, observed = qt)
    

    
startpoint = {'mu': np.mean(np.log(y)), 'sigma': np.std(np.log(y)), 'ymax': yobsmax*2.0}
map_estimate = pm.find_MAP(model=ullnmodel,start=startpoint)
map_estimate


#3

Your use case is more similar to Truncated Inverse normal distribution (also known as Wald distribution). You can have a look at the discussion there which is quite in-depth.

In general, if you are transforming the observed variable, you need to make sure to adjust the volume changes by adding the jacobian. In your code above, pt ~ Normal(mu, sd) does not imply y ~ULLN(exp(mu), sd)


#4

I’ve read the Wald discussion, but I don’t fully understand how it relates to my problem. If i further simplify then this is what i am trying to build in pseudocode:

theta ~ Normal(mu=0,sd=1) # a prior on theta
q = f(y, theta) #a function that makes a nonlinear transformation of my observations (y) which depends on theta
q ~ Normal(mu=0,sd=1)

Does anybody know of any other examples where something similar is done?


#5

Your model in the main post is the same as the pseudocode. But you might find that the estimated parameter theta is not correct, that’s because the volume change in the forward and backward transformation is not accounted for.
You can have a look at an example here: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/Miscellaneous/Regression%20with%20a%20twist.ipynb
where f(y, theta) is a linear function. In the notebook, model2_ and model2 is quite similar, but only the output from model2 that accounted for the volume change is correct.


#6

Thank you. This is super helpful (and a great example). I hope I can figure it out now


#7

For anybody stumbling on this thread. Here’s a link with some reading material explaining why this is needed. https://tpapp.github.io/post/jacobian-chain/


#8

I have just implemented it. I post it here for other people to learn from. Please correct me if I have misunderstood something.

In my case i have:

  • f(y) = log(ymax*y/(ymax-y))
  • f(y) ~ Normal(mu,sigma)
  • det(log(J)) = log(df/dy) = log(-ymax/(y*(y - ymax)))

which I turn into this code:

import numpy as np
import pymc3 as pm

## generate surrogate data
sigma=1
mu=0
ymax=5
q = np.exp(np.random.randn(500)*sigma+mu)
y = (q*ymax)/(q + ymax)

## setup model

ullnmodel = pm.Model()
yobsmax = np.amax(y)

with ullnmodel:
    ymax = pm.Pareto('ymax',alpha=1,m=yobsmax)
    
    mu = pm.Normal('mu', mu=0, sd=5)
    sigma = pm.Lognormal('sigma', mu=0, sd=5)
    
    qt =  pm.math.log(y*ymax) - pm.math.log(ymax  - y)
    q = pm.Normal('q', mu=mu, sd = sigma, observed = qt)
    pm.Potential('jacob_det',pm.math.log(-ymax/(y*(y - ymax))))
    
startpoint = {'mu': np.mean(np.log(y)), 'sigma': np.std(np.log(y)), 'ymax': 5}
map_estimate = pm.find_MAP(model=ullnmodel,start=startpoint)
map_estimate