Fast approximation for inference on hierarchical models?

Hi all, I was wondering if anyone would be able to offer some advice on a problem I have. I’ve created a simple hierarchical model (the kind you could fit in Bambi, although I’ve used PyMC directly instead as I’m using a Laplace error distribution instead of Gaussian) which I’ve fit using NUTS.

This works well, however the issue I have is that for production I really need to be able to do very fast inference, ideally extracting a closed-form approximation for the predicted likelihood like I’d be able to do with a Mixed Effects model in, say, StatsModels where I can directly pull out the parameters for the mean and variance. I can be as slow as I like in fitting the model, so running full NUTS in the first place to generate a Trace first then fitting an approximation to that would be doable, I just need to be very fast at inference time (and am happy to pay a predictive performance penalty to achieve this).

I took a look at the documentation for the VI options in PyMC hoping I’d find something that gives me a fast closed-form approximation of my Trace and came across this example which recommends the Empirical model, but even this approximation (as do all the other VI methods I looked at) seem to require sampling in order to determine the first and second moments at prediction time which is going to be too slow for my use case.

Does anyone have any advice about what options they’d recommend me to try for my use case? Thanks very much for any help provided!

You are able to pull out the parameters from the trace, exactly as in Statsmodels. The only difference is they are arrays of numbers, not single point estimates.

1 Like

I am pretty sure the mean and std/ covariance are there somehwere, you don’t need to take draws if you just want the fitted parameters. @ferrine can probably confirm?

1 Like

There are 2 options

  1. Fit the model with nuts and compute mean, cov for the trace (You can use inner representation of the Empirical (histogram) to compute mean and cov)
  2. Use full rank advi to compute an approximation, mean and cov are available for user. Fast sampling is also supported

Does this help to get the next research direction?

2 Likes

Thanks so much for all the help!

So let’s say I wanted to do the first option of the Empirical approach (or indeed the full rank ADVI approach, as it looks like they have a similar interface?) - what’s the correct way to pull out the predicted mean and variance for each entry in my test data? I read the documentation for Empirical but I’m not sure what functions here I should be using to get the fast mean/variance predictions?

This functionality is not documented. So you use publicly available attributes with user code on top.

Here you can find mean and std

While it is relatively easy to get rmap for mean or std using

it is more complicated for the covariance since it is 2 dimensional

I’m not sure what you are going to do with raw mean and cov. What I can suggest you is to make a cholesky decomposition for the cov and initialize

the full rank advi with those (mean + L_cov). You should take care of the diagonal in that case, just make sure the resulting cov is the one you pass.

1 Like

Thanks so much, I’ll give this a go!

So I’ve managed to train an ADVI model (mean field at first to learn what’s going on, will try the full rank if I can get this to work) and I’ve stared at the code for the MeanFieldGroup you linked to but, if I understand what’s going on correctly, this only seems to have the parameters for the full posterior (mu and sigma) and I’m not really sure how to get out the mu and sigma for the posterior predictive, i.e. after I’ve conditioned on my test data.

For the NUTS approach, I could get out the mean predictions for my test data set by the following code:

    def predict(self, test_df: pd.DataFrame) -> np.ndarray:
        ids = get_pymc_encodings(test_df, self._encoder)
        with self._model:
            pm.set_data({"feature_ids": ids)
            y_test = pm.sample_posterior_predictive(self._results)
        return y_test.posterior_predictive.y.mean(axis=(0,1)).values

So essentially I’m encoding my dataframe category values into integer ids, setting those values as my data and then sampling from the posterior predictive distribution and taking the mean of that to determine the means for each of my test data points.

Is there an analogous way to calculate the posterior predictive means (directly from the parameters, so without sampling) from an approximation class?