For the project i have been working on i am trying to implement a model which is a sum of lognormals and so i wanted to test how Pymc3 handles a general lognormal of the form
here is my implementation in Pymc
with pm.Model() as model:
mval = pm.Normal('mval', mu=5.0, sigma=5.0)
sval = pm.Normal('sval',mu=2.0,sigma=2.0)
tval = pm.Normal('tval', mu=5.0, sigma=5.0)
LN = tt.exp(-tt.log((x-tval)/mval)**2/(2*sval**2))/((x-tval)*sval*tt.sqrt(2*np.pi))
LN = tt.switch(tt.isnan(LN), 0.0, LN)
y_ = pm.Normal("y", mu=LN, observed=lognorm)
trace = pm.sample(1000,tune=1000)
the switch is there so i do not get NaN values when tval>x. I also took a note from the Stan people and used normal priors instead of uniform ones .“lognorm” is obtained from using the same equation with the parameters mval=2.5, sval=1.5 and tval=7.5. The number of points here is set using numpy.linspace
When i run the sampler i am surprised to find that is diverges between 60-90 times for each chain after tuning. The mean parameter values found are off from the ones i inputted for lognorm and with a relatively high sd. I also find that the number of effective samples is low (less than 10%)
Can someone help me understand why the sampler is having trouble with this lognormal? Again i am surprised by the result because a lognormal only has 3 parameters. Is there some way to define the lognormal differently so it keeps the sampler from diverging?
I’m not well versed into lognormals, but is there any reason why you’re not using the built-in distribution?
Thanks for linking the built-in distributions. I took a look over them and the lognormal there is missing the location parameter (theta) to shift the function on the x-axis. It also has mu, which is the mean of the log of the distribution, instead of the median of the distribution m. From trying different priors, it seems to be the location parameter and median that are the most obvious problem here. Theta just takes on a value close to the middle of the prior you define (if it is uniform) and m is just off.
Okay i have tried to limit the function to only the median and sigma. I tried a couple of different parameter values and priors and i think i understand where the problems lies. I just have to specify sigma for the likelihood. I am not sure what happens if sigma is not specified (i would suspect some like sigma=1.0). I just used the standard deviation of the input data and that seemed to work.
Out of curiosity, you used your custom function or the built-in one?
The custom one. I needed the location paramter theta for what i am working on, so i wanted to try using my custom function.
Thanks, good to know!
I’m not used to log-normals, but if you feel like the location parameter is customary and would be useful to have in the built-in distribution, I think a PR would be greatly appreciated
I would be glad to I have not tried to do a PR before, but i will try it. The lognormal can be written up a couple of ways, but let me implement a modified version of the one already used.
Awesome, thank you! Don’t hesitate if you have questions – not a PR expert either, but maybe I’ll be able to help
Okay i have opened up a issue on the github with a first try here. Not entirely sure how to test if this works, but it might still need some checks to see if x>theta.