Updating priors vs using more data give different results

I am following the Updating priors example (Updating priors — PyMC3 3.10.0 documentation), and tried to verify my intuitive thought that using i.e. 100 observed values with a fixed prior, would be the same as using this updating method on the same initial prior and then updating with 10 observed values at a time over 10 iterations. Makes sense?

However the two posteriors, after updating 10 times and sampling with 100 observed look quite different as shown below. What am I missing here?

image

I have only skimmed through the updating priors notebook and have never followed the example and executed the code myself, so my understanding of what the notebook is actually doing may not be completely accurate.

My guess is that the prior updates are done in a univariate mode. Therefore, much of the information contained in each posterior update is lost; the posterior distributions for alpha and betas are not independent, and are probably significantly correlated (you can check this with az.plot_pair on the initial model for example). However, when setting the prior for the next iteration, the priors are set independently on alpha and each beta from its marginal posterior distribution, which is not the same as using the whole posterior as the prior for the new fit.

This is almost certainly a problem with applying the “width extension with linear decay” multiple times (i.e., this part:

    # what was never sampled should have a small probability but not 0,
    # so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])

You can basically see it as the bad left tail of the “updated” distribution. This has been noticed before:

context: Can traces be used as priors? - #6 by Edderic

2 Likes