How to choose a distribution for the likelihood step?

Yuen_Wa_Ho · March 20, 2019, 2:29pm

I have been looking over quite a few examples.
I am just wondering how to choose a proper distribution for the likelihood step?

For instance, I am looking at at hierarchical switchpoint analysis, with continuous measurement instead of count data?
Should I be using a pm.Poisson for the likelihood? or should I use a pm.Normal?

junpenglao · March 20, 2019, 3:36pm

If your measurement is continuous on the real domain, pm.Normal usually is a good choice as a first start.

Gon_F · March 20, 2019, 5:54pm

It seems you are confusing a decision fundamental to statistics (“which distribution do I choose for my model?”) with PyMC3’s inner workings somehow, because using hierarchical analysis or not has no bearing on the distribution decision.

I suggest you work through McElreath’s book, which, among other modelling decisions, addresses this question of when to use which distributions.

azadmd · May 9, 2024, 6:31pm

Can we use weibull distribution for likelihood? If so, can you provide some reference? I have tried some but could not really find.
Thanks in advance

cluhmann · May 9, 2024, 10:30pm

Welcome!

I guess it depends what you mean by a “reference”. But you can certain use a Weibull if you think that’s appropriate:

import arviz as az
import pymc as pm

with pm.Model() as model:
    alpha = pm.HalfNormal("alpha", sigma=5)
    beta = pm.HalfNormal("beta", sigma=5)
    likelihood = pm.Weibull(
        "likelihood", alpha=alpha, beta=beta, observed=[1, 2, 3, 4, 5, 6, 7, 8]
    )
    idata = pm.sample()

az.summary(idata)

yields:

        mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
alpha  2.014  0.585   0.935    3.082  ...    0.010    1798.0    1825.0    1.0
beta   5.227  1.044   3.339    7.154  ...    0.017    1816.0    2048.0    1.0

azadmd · May 11, 2024, 12:16pm

Thank for your response. I appreciate your kindness. You responded exactly what i needed. Thanks again.
I have one more question.
In most cases, I have found that normal likelihood is used. Unfortunately, I found that my observations/evidence are not normally distributed, and a Weibull distribution is a better fit. In such a case, Can I consider my likelihood as Weibull likelihood? Your response will be highly appreciated.
Thanks in advance.

cluhmann · May 11, 2024, 12:40pm

In some sense, the answer to these sorts of questions is always something along the lines of “there are no rules”. You can build you model in any way you see as appropriate. That being said, there are likely to be conventions that exist within the area you happen to working in. Some common likelihoods can be see in this list provided by Bambi. The Weibull distribution is commonly used to model the survival function in survival analyses (e.g., here). But there are probably lots of other applications that I don’t know about.

azadmd · May 11, 2024, 1:05pm

Thanks for your reply.
I really appreciate your kindness.
I will go through the reference you provided.
Best regards

azadmd · May 14, 2024, 6:21am

Hi,
I have checked some previous study on my field. I have found most studies followed Normal likelihood with Normal-gamma prior. However, in the examples, I have seen the normal-halfnormal prior. So, i got little confused about it. I am also confused about assuming the parameters of normal-gamma prior (specifically alpha and beta). Would you kindly advise in this matter? Your responses will be highly appreciated.
Thanks in advance.

cluhmann · May 14, 2024, 2:45pm

Priors on scale parameters (e.g., the sigma parameter in a normal distribution) were historically gamma-distributed. But the parameterization of gamma distributions isn’t super intuitive and so a lot of people prefer to use a half normal. But ultimately, you should do some prior predictive checking to see if you model + your priors yield results that make sense to you. See this notebook for information about this workflow.

azadmd · May 14, 2024, 5:50pm

Thank you so much for your reply.
I appreciate it.
Best regards

azadmd · May 15, 2024, 6:18pm

Hi,
I have tried with halfnormal and gamma both.The posterior statistics is very close.
I am trying to see posterior predictive.

I am sincerely thankful for your responses.

May I have one more query? I am trying to plot the joint distribution of my mean and sigma. But I am not successful yet. Can you kindly advise? A sample code will be highly appreciated.

Thanks in advance
Best regards

cluhmann · May 15, 2024, 7:08pm

You can try az.plot_pair().

azadmd · May 18, 2024, 4:12pm

Thank you so much for your reply.
I found that az.plot_kde is useful. In my case, I have two parameters: mu and sigma.
I am trying to use it. But i am not successful. I am using the following code.

non_centered = az.load_arviz_data('non_centered_eight')
mu_posterior = np.concatenate(non_centered.posterior["mu"].values)
sigma_posterior = np.concatenate(non_centered.posterior["sigma"].values)
az.plot_kde(mu, values2=sigma, contour=False)

I received the following message

TypeError Traceback (most recent call last)
TypeError: float() argument must be a string or a real number, not ‘TensorVariable’

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
Cell In[111], line 1
----> 1 az.plot_kde(mu, values2=tau, contour=False)

File ~\anaconda3\Lib\site-packages\arviz\plots\kdeplot.py:271, in plot_kde(values, values2, cumulative, rug, label, bw, adaptive, quantiles, rotated, contour, hdi_probs, fill_last, figsize, textsize, plot_kwargs, fill_kwargs, rug_kwargs, contour_kwargs, contourf_kwargs, pcolormesh_kwargs, is_circular, ax, legend, backend, backend_kwargs, show, return_glyph, **kwargs)
269 else:
270 gridsize = (128, 128) if contour else (256, 256)
→ 271 density, xmin, xmax, ymin, ymax = _fast_kde_2d(values, values2, gridsize=gridsize)
273 if hdi_probs is not None:
274 # Check hdi probs are within bounds (0, 1)
275 if min(hdi_probs) <= 0 or max(hdi_probs) >= 1:

File ~\anaconda3\Lib\site-packages\arviz\stats\density_utils.py:808, in _fast_kde_2d(x, y, gridsize, circular)
785 def _fast_kde_2d(x, y, gridsize=(128, 128), circular=False):
786 “”"
787 2D fft-based Gaussian kernel density estimate (KDE).
788
(…)
806 ymax: maximum value of y
807 “”"
→ 808 x = np.asarray(x, dtype=float)
809 x = x[np.isfinite(x)]
810 y = np.asarray(y, dtype=float)

ValueError: setting an array element with a sequence.

Would you provide some advice?

Thanks a lot.

cluhmann · May 18, 2024, 4:53pm

I think you can just do this?

non_centered = az.load_arviz_data('non_centered_eight')
az.plot_density(non_centered.posterior, var_names=["mu"])

or

non_centered = az.load_arviz_data('non_centered_eight')
az.plot_posterior(non_centered.posterior, var_names=["mu"])

There is no sigma parameter.

azadmd · May 19, 2024, 3:41pm

Thank You for your reply.

azadmd · May 21, 2024, 7:14pm

I could plot the joint distribution using Seaborn. I could get the figures I needed. I am thankful for your responses.
Currently, I am trying to work on fitting the parameters of Norma-gamma distribution. I was trying to use MLE. But i found that the initial guess affects the final results. Is there any established rules or thumb rules for the initial guess? Or can i consider the initial guess using method of moment?
Thanks in advance

cluhmann · May 21, 2024, 7:51pm

Can you say more about what you are trying to do here? Fit in what way? Are you constructing priors or estimating values for the parameters in your model?

azadmd · May 21, 2024, 8:35pm

Thanks for Your reply.
I have some data from literature and experiments. I am trying to construct the prior from these data.

Thanks in advance

cluhmann · May 22, 2024, 2:14am

You can look into PyMC’s find_constrained_prior() function and see if that helps.

Topic		Replies	Views
Choosing the right likelihood function Questions	3	1430	May 8, 2022
Custom Weibull distribution Questions	9	1864	April 11, 2019
Selecting prior distributions for Cauchy as likelihood distibution Questions prior	5	1021	January 11, 2022
How to formulate a Lognormal likelihood using PyMC3? Questions	19	2351	March 10, 2021
Posterior distribution, always normal on a simple model like described? v3	1	591	June 13, 2022

How to choose a distribution for the likelihood step?

Related topics