How to formulate a Lognormal likelihood using PyMC3?

I have formulated the model as below for Normal priors and Normal Likelihood.

with pm.Model() as bm:

# Intercept    
alpha = pm.Normal('alpha', mu=17.25, sigma=3.75)

# Slope 
beta = pm.Normal('beta', mu=0.363, sigma=0.050)

# Standard deviation
sigma = sigma

# Estimate of mean
mean = alpha + beta*X_arr

# Observed values
Y_obs = pm.Normal('Y_obs', mu = mean, sd = sigma, observed = Y_arr)

The below image has been modeled as Y_obs.

My question is how would I model the same if I were to use Lognormal priors and the likelihood for the above expression? What changes will be there in the above expression?

I’m having a hard time making sense of how your model code is connected to the equation in your image. I’m not clear on what parameters are meant to correspond to c, \sigma, and v. In terms of your code, your sigma appears to be undefined. It would be helpful if you could provide more info.

If your question is specifically about a lognormal distribution, it has its own class in pymc3. I.e. pm.Lognormal() (Continuous — PyMC3 3.10.0 documentation)

Hope that helps.

Thank you for your reply.

The above formulation is based on this linear relationship: τf = c + σv + ε
In my code, apha is c, beta is v, sigma is sε (which is constant in my case).
For the normal distribution, mean = c + σv and variance = s^2ε

I did go through the documentation for pm.Lognormal but I couldn’t figure out how to model the same considering my problem statement.

I hope this helps.

I think if you just replace every occurrence of pm.Normal by pm.Lognormal you get what you need (but note that sd is called sigma). Or am I misunderstanding you?

If that is the case, how would my mathematical expression look like? Could you please write it down for me?

Yes, sd = sigma = sε (which is constant in my case)

p(\tau_f | c, v, s_e) = \frac{1}{\tau_f s_e \sqrt{2\pi}} \mathrm{exp}[(\mathrm{ln}(\tau_f) - c - \sigma v)^2/2s_e^2)]

The lognormal simply means the log of your random variable (\tau_f) is normally distributed.

1 Like

I had mentioned that my priors (c,v) and likelihood both need to be Lognormally distributed. Will the above expression still remain the same?

Could you please help me write the code as well?

The mean and standard deviation of the Normally distributed priors are mentioned in my question. Please help me write the entire code.

If I’m understanding you correctly, yes, the above lines of code should stay the same, except you’d replace pm.Normal() with pm.Lognormal as @harcel suggested, as in:

pm.Normal('param_name', mu=mu_param, sigma=sig_param)

is replaced with

pm.Lognormal('param_name', mu=mu_param, sigma=sig_param)

If this still doesn’t clear things up for you, perhaps you could provide some plots of your data or some more context to help us understand what the problem is.

1 Like

What would be mu_param, sig_param for alpha, beta and Y_obs considering the Lognormal equation that you wrote? The mu_param, sig_param for the Normally distributed priors, and the Likelihood have been mentioned in my question.

They are exactly the same as the arguments for all three of your pm.Normal lines. Literally, replace every instance of the word “Normal” with “Lognormal” in your code. That’s all.

1 Like

If you are unsure your parameters do not correspond to your mathematical model you can evaluate it manually and check the results:

pm.Lognormal.dist(mu=0, sigma=1).logp(x).eval()

For any mu, sigma and x

1 Like

with pm.Model() as bm:

# Intercept    
alpha = pm.Lognormal('alpha', mu=17.25, sigma=3.75)

# Slope 
beta = pm.Lognormal('beta', mu=0.363, sigma=0.050)

# Standard deviation
sigma = sigma

# Estimate of mean
mean = alpha + beta*X_arr

# Observed values
Y_obs = pm.Normal('Y_obs', mu = mean, sd = sigma, observed = Y_arr)

trace = pm.sample(draws=10000,model=bm)

I tried the above code as suggested.

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV beta_log__.ravel()[0] is zero.
“”"

The above exception was the direct cause of the following exception:

I am getting the above error.

In this case, chain 3 failed.

Also I changed the Likelihood to Lognormal as:

with pm.Model() as bm:

# Intercept    
alpha = pm.Lognormal('alpha', mu=17.25, sigma=3.75)

# Slope 
beta = pm.Lognormal('beta', mu=0.363, sigma=0.050)

# Standard deviation
sigma = sigma

# Estimate of mean
mean = alpha + beta*X_arr

# Observed values
Y_obs = pm.Lognormal('Y_obs', mu = mean, sd = sigma, observed = Y_arr)

trace = pm.sample(draws=10000,model=bm)

ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV beta_log__.ravel()[0] is zero.

The above exception was the direct cause of the following exception:

Similar error was thrown.

Have you tried prior predictive checks? I.e.: sample from your priors and plot the models on top of the data, to see if the priors are chosen well (the one on beta has a very small sigma)?

No, I haven’t tried that. Let me post my original question so that everything is clear to you all.

It seems PyMC3 calculates the Mean and Standard Deviation (SD) of Lognormal distribution in a different way which is not the same as found in a standard textbook. Can somebody please explain what Mean and SD should I use in PyMC3 while fitting a Lognormal Distribution?

image

As shown in the figure, I have obtained Negative values of lambda and epsilon (the fitting parameters of Lognormal as given in Wikipedia).

I have used the Wikipedia relationship to obtain lambda and epsilon for pm.Lognormal() method. Initially, I used the above fitting parameters to fit a lognormal. But that didn’t work out.

I don’t have any data except the mean and sd of Normally distributed priors.

I’m afraid I’m not fully understanding you. First of all: you have no data? Then how did you obtain the lognormal parameters in your table?

As far as I know, the definitions of the mean and sd of lognormal in PyMC3 is very standard. What are your lambda and epsilon? Typically, we speak about mu and sd (or sigma, it alternatively tau) to specify a lognormal. I’m guessing your epsilon is something like an error?

When you say you don’t have data, I’m confused as to what you are after. Do you maybe have a full notebook (or script) that I can look at?

The mu and sigma of the Lognormal are very straightforward, it’s the mu and sigma of a normal in log space before exponentiation. These are equivalent ways of modelling x as being Lognormally distributed:

import pymc3 as pm
import arviz as az

with pm.Model() as manual:
  x_log = pm.Normal('x_log', mu=2, sigma=0.25)
  x = pm.Deterministic('x', pm.math.exp(x_log))
  trace_manual = pm.sample()

print(az.summary(trace_manual))

with pm.Model() as auto:
  x = pm.Lognormal('x', mu=2, sigma=0.25)
  trace_auto = pm.sample()

print(az.summary(trace_auto))
#manual
        mean     sd  hdi_3%  hdi_97%  ...  ess_sd  ess_bulk  ess_tail  r_hat
x_log  2.006  0.253   1.519    2.467  ...  1883.0    1894.0    2896.0    1.0
x      7.677  1.983   4.279   11.355  ...  1901.0    1894.0    2896.0    1.0

# auto
    mean     sd  hdi_3%  hdi_97%  ...  ess_sd  ess_bulk  ess_tail  r_hat
x  7.676  1.952   4.344   11.301  ...  1599.0    1584.0    2966.0    1.0

This parametrization of mu and sigma is the one described first in the Wikipedia page: Log-normal distribution - Wikipedia

If you want to think about mean and sigma of x (and not log_x), wikipedia suggests you should model mu and sigma as:

1 Like

I have assumed the data to be normally distributed. Assumptions are such that mu’ ± 3sd’ are within tolerable limits (physics of the problem). Thus, mu’ and sd’ are the assumed means and standard deviations for 5 different cases which are within their limits. Here, lambda and epsilon are the lognormal parameters as obtained from the below expression (Wikipedia).

I was thinking of using lambda and epsilon in pm. Lognormal for the priors which don’t seem to be the case with PyMC3. II have shared the part of the code that is necessary. I hope my question is clear now.

This explains a lot to me. Thanks.

However, my case is just the reverse. I know mean and sd of x (let’s say 7.677 and 1.983 respectively). Now, how do I determine mu =2 and sigma = 0.25 to fit pm.Lognormal on x?

You just plug those values in the wikipedia formula above:

mu_x = 7.677                                                           
sigma_x = 1.983                                                        

mu_logx = np.log(mu_x**2 / np.sqrt(mu_x**2 + sigma_x**2))    
sigma_logx = np.sqrt(np.log(1 + sigma_x**2 / mu_x**2))                 
              
print(f'{mu_logx=}, {sigma_logx=}')                                                                
# mu_logx=2.005934131548513, sigma_logx=0.25414450337258276

You plug mu_logx and sigma_logx in x = pm.Lognormal to get x distributed with the original mu_x and sigma_x.