Understanding the likelihood function

I am struggling with understanding a key element of an inference model in PYMC3.

The likelihood distribution can be understood as “how you think your data is distributed”(?), I am however confused.

I have a simple example case to showcase my confusion:

model = pm.Model()
data = np.random.exponential(scale=2,size=1000)
with model:
    # prior on mu
    lam = pm.Normal("lam",sd=1)
    pm.Exponential("L",lam, observed = data)
    trace = pm.sample(draws=1000)

In this case, I am infering the lambda value of the data i believe is exponentially distributed, hence my exponential likelihood function(?).

Sample_posterior_predictive seems to sanity-check that my model is somewhat correct (original data in blue):


  1. The likelihood function cannot change shape? so if my likelihood function is normally distributed, but my data is exponentially distributed, the sample_posterior_predictive would still be normally distributed? In that case, choosing the right likelihood function is crucial?

  2. If above is correct, how come more complex models can have a normally distribued likelihood function? I.e. the Getting started with PyMC3(John Salvatier) showcases a motivational example of a linear regression ( Getting started with PyMC3) where there are multiple prior random variables. The likelihood is still normally distributed, how come? We are not saying the data comes from a normal distribution? What is normally distributed in this likelihood function?

Thank you so much to anyone taking the time to answer this, I believe this is the last peace of my pussle understanding how these inference works in broad terms.

If you have not seen it, I gave a talk few years back on likelihood should be a good place to start/consolidate your understanding: GitHub - junpenglao/All-that-likelihood-with-PyMC3

As for your question:

Choosing the likelihood function is always crucial, and while likelihood function cannot change shape (technically it change shape according to the parameterization, e.g., the Gaussian becomes wider with larger \sigma), you do see shape changes using sample_posterior_predictive or sample_prior_predictive because they can depends on other information. You can see it simply with:

with pm.Model():
    sigma = pm.HalfNormal("sigma", 1)
    observed = pm.Normal('obs', np.arange(10), sigma, observed=np.arange(10) + np.random.randn(10))

Hopefully with the above, you can now answer (2). The key is to understand whether the random variable representing the observation is a scalar -like variable and you repeatedly draw from this variable

obs = np.random.normal(3, 2.5, size=(num_obs))

or a vector -like variable that depends on other information

x = np.arange(num_obs)
obs = np.random.normal(x, 2.5)

in both case you have obs.shape=(10,), but the interpretation and how you would build a Bayesian model is very different

1 Like