The limits of using the traceplot to update priors?

Gon_F · April 13, 2019, 6:56am

There has always been a temptation I’ve felt, to, after the first version of my model has run, update all my variables’ priors to match the results of the pm.traceplot() graphs. Is this a valid method of improving one’s model?

I had read a quote a while back from Gelman that trace plots can indeed be utilized to further inform one’s priors, but there has to be some limit to this. Otherwise, why wouldn’t the default modelling process always be to

run the model with vague priors, and
if the traceplots create “stable enough” looking variable distributions, use kernel methods on each stable variable to gain its mean and std, and feed those into the respective model variables?

I feel this is an amateurish question, but would someone please elucidate me on how to properly use traceplots (assuming everything has already converged correctly so there are no problems there) to better one’s model?

junpenglao · April 13, 2019, 9:37am

It was recently debated on Twitter, that updating prior (beside in most cases un-interesting conjugate models) under Bayesian framework is actually not trivial. Personally, approximating the posterior with some heavy tail distribution (say a t distribution) is what I would do. Otherwise, Bayesian filter (Kalman filter, particle filter) sounds like a promising framework that I would love to explore more.

Gon_F · April 13, 2019, 5:16pm

Bayesian filters sound interesting, will look into them.

For updating one’s priors based only on model results, there must be some best-practice that doesn’t take shortcuts statistically, or maintains rigor.

I’m naively imagining that best practice to simply be: start from the vaguest priors, and only utilize subsequent traceplot information to the minimum necessary amount until the model fully converges/displays no serious errors.

But, again, I have no clue, so I will try research this important question.

chartl · April 13, 2019, 6:45pm

Do you have a link to this discussion? This sounded to me like “straightforward” Empirical Bayes, so I’m surprised that there’s much more to it.

Gon_F · April 13, 2019, 7:56pm

I just found a useful example in the docs that matches the idea I had in the first post.

https://docs.pymc.io/notebooks/updating_priors.html

However, very little justification is given for this procedure, or info on where it can go wrong.

chartl · April 13, 2019, 8:33pm

That example more closely matches online learning, where an additional (say, kth) set of data comes in and you’d like to update the estimates you have after set 1, …, k-1. Whereas you’re more interested in making your priors “better” without new data.

Empirical Bayes is a simple way of doing this; and there are approaches for conjugate, non-conjugate, nonparametric (&c). In the simplest case, you have some prior \pi(\theta|\xi) which is parameterized by \xi (so N(0, \sigma^2) would be \xi=\sigma^2). You’d assign a hyperprior to \xi generating a full prior \pi(\theta)=\pi(\theta|\xi)p(\xi), (i.e. \sigma^2 \sim \mathrm{HalfNormal}(5.)).

Empirical Bayes (in its simplest version) seeks a point estimate \hat \xi = \mathrm{argmax}\{P(X|\theta)\pi(\theta|\xi)p(\xi)\}; i.e. the maximum marginal likelihood. You could read this directly off the trace as the mean of the sampled \sigma^2.

Importantly, the only parameters that update are the hyperparameters, which (unless you over-parameterized your prior) protects you from doing MAP estimation \hat \theta = \mathrm{argmax}\{P(X|\theta)\pi(\theta)\}.

(For example, using \pi(\theta|\xi) = \mathcal{N}(\theta| \mu_\xi, \sigma^2_\xi) will ultimately set \mu_\xi to MAP \theta, and \sigma^2_\xi to 0)

Gon_F · April 15, 2019, 5:01am

When I was doing my own research, Empirical Bayes only seemed to me like

calculating the statistics of interest out of your data (for ex., if you’re calculating a true mean of sets of observations, then to just calculate the global mean from combining each set), and
throw that into the respective prior.

This seemed like serious double-counting at first, but then seemed more reasonable, upon further thought. I think I will have to dip into the philosophy of priors and more of your post’s math. Thanks for the thoughtful reply!

junpenglao · April 15, 2019, 8:18am

You should be aware that Empirical Bayes is not the same as continuously updating priors.
Also, one difficulty of Empirical Bayes is that some parameters are difficult to compute the (sufficient) statistics and used as prior. For example, hierarchical prior for a random effect model.

Topic		Replies	Views
Plotting priors in traceplot Questions	2	2517	February 15, 2018
Can traces be used as priors? Questions	5	2362	September 12, 2019
How I get traceplot (or something else) to show priors? Questions	5	5534	June 17, 2019
Updating priors vs using more data give different results Questions	2	1083	March 15, 2021
Using traceplot to compare different posteriors? Questions	3	996	December 19, 2017

The limits of using the traceplot to update priors?

Related topics