Getting point estimate from posterior

nkaimcaudle · January 21, 2021, 7:56am

Hello

I have a fairly complex regression model which I can successfully obtain a posterior for. I need to extract a single point estimate for the coefficients from the posterior for use in my wider workflow. I realise that I should use the full posterior, run it through the entire workflow and only make a decision at the end, however my setup doesn’t allow for this (at the moment).

Are there any suggested methods for extracting a single set of coefficients? Here it simply says

In summary, while PyMC3 provides the function find_MAP() , at this point mostly for historical reasons, this function is of little use in most scenarios. If you want a point estimate you should get it from the posterior. In the next section we will see how to get a posterior using sampling methods.

I can think of a few methods:
1/ Use find_MAP()
2/ From the posterior use the point with the highest log-likelihood
3/ Use the mean of posterior
4/ Use the median of posterior
5/ Have some sort of distance metric and select the point that is closest to all others

Outputs from 2 and 5 would be an actual sample from the posterior. Whereas 3 & 4 would not be.

A few notes on the model:

It is hierarchical
Most the RVs are Normal however a few have mean!=median.
There are ~20 RVs
Due to time constraints I only run a small number of draws, the ESS is typically ~200.

Any thoughts?

cluhmann · January 22, 2021, 12:13am

As you suggest, the most “Bayesian” answer is to not do this and the documentation regarding the MAP reflects this. But if you must, then you need to select an estimator and do so considering how collapsing your full posterior into a single point estimate will influence downstream inferences. Your tolerance for different kinds of consequences should shape what estimator you ultimately select. The mean, median, and mode, for example all reflect different loss functions. Keep in mind that some of these estimators may seem more reasonable assuming “well-behaved” distributions (e.g., symmetric, unimodal, etc.) but that posteriors aren’t always so cooperative.

We had some related discussion here.

Martin_Ingram · January 22, 2021, 12:29am

I completely agree with cluhmann’s answer. I would just add that the posterior mode for continuous variables doesn’t really correspond to any choice of loss function (see passage after “When the loss function is of the form…” here Maximum a posteriori estimation - Wikipedia). In addition, find_MAP can have issues with hierarchical models. So I would suggest steering away from that. To choose from the others, you can take a look at cluhmann’s resources. The posterior mean, for example, minimises the mean square error loss and can be reasonable choice in many cases.

Topic		Replies	Views
How to reference posterior mode value without .find_MAP? Questions	3	1231	August 8, 2019
Estimated fixed coefficients in linear regression Questions	1	400	June 12, 2018
Common reasons for getting a map estimate that is far from the mode of the posterior v5	9	1262	April 29, 2023
How to Pull Point Estimates Out of Posterior Check Questions	1	530	February 6, 2019
Maximum likelihood estimation of a Bayesian model Questions	11	5540	April 12, 2018

Getting point estimate from posterior

Related topics