Interpreting Bayesian Inference with multiple parameters for the MLE


#1

This a broad question, as I am trying to understand the usefulness of Bayesian inference when we have a model with multiple variables.

In general, I am interested in the MLE. However, the reason I am using pymc3 is it allows me to inspect the full distribution of the parameter of interest and not just the error interval.

Now, I am getting into models having multiple variables and I am a bit confused. The MLE (through optimization) yields a combination of the parameters maximizing the likelihood.
Bayesian inference, as far as I understand, produces individual posteriors (not multidimensional plan of parameters).

What should I do to find the set of values that maximize the likelihood; could it be that it is simply the mean of each individual parameter posterior?

pointers on that would be very welcome :slight_smile:

Thanks


#2

You should not use MLE, it is usually a bad representation of the posterior - using one single point in the high-dimensional space it is at best a incomplete picture of said space, and worse a wrong representation of the posterior. More generally, Bayesian inference is about using the posterior space, an individual point like MLE is one (pretty limited) way.

With that out of the way, to answer your question: in a model with multiple parameters, the model likelihood is a function that map these parameters to a value: \theta \mapsto f(\theta), so MLE is finding the vector value \theta that maximize f(\theta). It is not “the mean of each individual parameter posterior?”


#3

My confusion particularly comes from the fact that the output from bayesian inference is individual parameter posteriors.
At the end of the day, I am interested in estimating latent parameters that explain a phenomena. Say alpha and beta of a beta binomial. Having the mean and credible intervals of alpha is great, however, they are meaningless without having the corresponding beta’s.
As I said, I am used to MLE returning a particular combination of parameters (ok, for the task of maximizing the LL).
I hope I honed in on my question. Thank you.


#4

Why not try pm.sample and compute the mean of the posterior? If you really need MLE you can also use pm.find_MAP


#5

@ded, I don’t really understand what you mean by the posterior of individual parameters. Do you mean to say that sample returns marginal probability distributions for each individual parameter separately? If that is what you meant, I’m afraid you are confused. The output of Bayesian inference is the joint probability distribution of all the model parameters. The trace that is returned by sample stores points from the joint distribution, which are combinations of the parameters that come from the joint posterior, and you can get them either with __getitem__, point or points. With the list of points you can get the expected value of all parameters and also their covariance matrix. If you want to find the most probable point, you should use, find_map.


#6

Awesome. Thank you!
I was indeed confused by the marginal (individual posterior) and the joint probability of the parameters. I’ve been staring at the output of plot_posterior for way too long.

What added to this confusing is that I couldn’t find tools or examples analyzing the joint distribution.

So, is there a way for plotting the joint posterior of two specified parameters?


#7

You can use pairplot in pymc3, eg: https://docs.pymc.io/notebooks/Diagnosing_biased_Inference_with_Divergences.html?highlight=pairplot


#8

Looks like this is what I need, thanks a million!


#9

By the way, I have a model that does not converge for my data. Using scipy’s optimization has led me to a local optimum and I didn’t realize the issue then. PyMC3 has helped me detect this, even though I am using uninformative priors (basically just uniform ranges).