Fast approximation for inference on hierarchical models?

mdg · May 26, 2022, 7:57pm

Hi all, I was wondering if anyone would be able to offer some advice on a problem I have. I’ve created a simple hierarchical model (the kind you could fit in Bambi, although I’ve used PyMC directly instead as I’m using a Laplace error distribution instead of Gaussian) which I’ve fit using NUTS.

This works well, however the issue I have is that for production I really need to be able to do very fast inference, ideally extracting a closed-form approximation for the predicted likelihood like I’d be able to do with a Mixed Effects model in, say, StatsModels where I can directly pull out the parameters for the mean and variance. I can be as slow as I like in fitting the model, so running full NUTS in the first place to generate a Trace first then fitting an approximation to that would be doable, I just need to be very fast at inference time (and am happy to pay a predictive performance penalty to achieve this).

I took a look at the documentation for the VI options in PyMC hoping I’d find something that gives me a fast closed-form approximation of my Trace and came across this example which recommends the Empirical model, but even this approximation (as do all the other VI methods I looked at) seem to require sampling in order to determine the first and second moments at prediction time which is going to be too slow for my use case.

Does anyone have any advice about what options they’d recommend me to try for my use case? Thanks very much for any help provided!

jessegrabowski · May 27, 2022, 1:21am

You are able to pull out the parameters from the trace, exactly as in Statsmodels. The only difference is they are arrays of numbers, not single point estimates.

ricardoV94 · May 27, 2022, 4:54am

I am pretty sure the mean and std/ covariance are there somehwere, you don’t need to take draws if you just want the fitted parameters. @ferrine can probably confirm?

ferrine · May 27, 2022, 7:38am

There are 2 options

Fit the model with nuts and compute mean, cov for the trace (You can use inner representation of the Empirical (histogram) to compute mean and cov)
Use full rank advi to compute an approximation, mean and cov are available for user. Fast sampling is also supported

Does this help to get the next research direction?

mdg · May 27, 2022, 8:33am

Thanks so much for all the help!

So let’s say I wanted to do the first option of the Empirical approach (or indeed the full rank ADVI approach, as it looks like they have a similar interface?) - what’s the correct way to pull out the predicted mean and variance for each entry in my test data? I read the documentation for Empirical but I’m not sure what functions here I should be using to get the fast mean/variance predictions?

ferrine · May 27, 2022, 9:19am

This functionality is not documented. So you use publicly available attributes with user code on top.

Here you can find mean and std

github.com

pymc-devs/pymc/blob/main/pymc/variational/approximations.py#L295

      
        
            
            
@property
            def histogram(self):
                return self.params_dict["histogram"]
            
            
@node_property
            def mean(self):
                return self.histogram.mean(0)
            
            
@node_property
            def cov(self):
                x = self.histogram - self.mean
                return x.T.dot(x) / pm.floatX(self.histogram.shape[0])
            
            
@node_property
            def std(self):
                return at.sqrt(at.diag(self.cov))
            
            
def __str__(self):
                if isinstance(self.histogram, aesara.compile.SharedVariable):
                    shp = ", ".join(map(str, self.histogram.shape.eval()))

While it is relatively easy to get rmap for mean or std using

github.com

purna135/pymc/blob/4623916292630a4b4f815173887cfbf324a1b0bb/pymc/blocking.py#L51

      
        
                """
            
            
    def __init__(self, fa: Callable[[PointType], T], fb: Callable[[RaveledVars], PointType]):
                    self.fa = fa
                    self.fb = fb
            
            
    def __call__(self, x: RaveledVars) -> T:
                    return self.fa(self.fb(x))
            
            

            
class DictToArrayBijection:
                """Map between a `dict`s of variables to an array space.
            
            
    Said array space consists of all the vars raveled and then concatenated.
            
            
    """
            
            
    @staticmethod
                def map(var_dict: PointType) -> RaveledVars:
                    """Map a dictionary of names and variables to a concatenated 1D array space."""
                    vars_info = tuple((v, k, v.shape, v.dtype) for k, v in var_dict.items())

it is more complicated for the covariance since it is 2 dimensional

I’m not sure what you are going to do with raw mean and cov. What I can suggest you is to make a cholesky decomposition for the cov and initialize

github.com

pymc-devs/pymc/blob/4d2f3a8d5f99fc773bc0a1aad6d51e4f3f047698/pymc/variational/opvi.py#L624

      
        
            
            
The other way to select approximation is to provide `params` dictionary that has some
            predefined well shaped parameters. Keys of the dict serve as an identifier for variational family and help
            to autoselect the correct group class. To identify what approximation to use, params dict should
            have the full set of needed parameters. As there are 2 ways to instantiate the :class:`Group`
            passing both `vfam` and `params` is prohibited. Partial parametrization is prohibited by design to
            avoid corner cases and possible problems.
            
            
.. code:: python
            
            
    >>> group = Group([latent3], params=dict(mu=my_mu, rho=my_rho))
            
            
Important to note that in case you pass custom params they will not be autocollected by optimizer, you'll
            have to provide them with `more_obj_params` keyword.
            
            
**Supported dict keys:**
            
            
-   `{'mu', 'rho'}`: :class:`MeanFieldGroup`
            
            
-   `{'mu', 'L_tril'}`: :class:`FullRankGroup`

the full rank advi with those (mean + L_cov). You should take care of the diagonal in that case, just make sure the resulting cov is the one you pass.

mdg · May 27, 2022, 7:51pm

Thanks so much, I’ll give this a go!

mdg · May 27, 2022, 8:46pm

So I’ve managed to train an ADVI model (mean field at first to learn what’s going on, will try the full rank if I can get this to work) and I’ve stared at the code for the MeanFieldGroup you linked to but, if I understand what’s going on correctly, this only seems to have the parameters for the full posterior (mu and sigma) and I’m not really sure how to get out the mu and sigma for the posterior predictive, i.e. after I’ve conditioned on my test data.

For the NUTS approach, I could get out the mean predictions for my test data set by the following code:

    def predict(self, test_df: pd.DataFrame) -> np.ndarray:
        ids = get_pymc_encodings(test_df, self._encoder)
        with self._model:
            pm.set_data({"feature_ids": ids)
            y_test = pm.sample_posterior_predictive(self._results)
        return y_test.posterior_predictive.y.mean(axis=(0,1)).values

So essentially I’m encoding my dataframe category values into integer ids, setting those values as my data and then sampling from the posterior predictive distribution and taking the mean of that to determine the means for each of my test data points.

Is there an analogous way to calculate the posterior predictive means (directly from the parameters, so without sampling) from an approximation class?

Topic		Replies	Views
Sampling a gaussian using ADVI Questions	3	676	April 19, 2018
Extracting variable means from ADVI fitted model Questions	11	2732	March 27, 2021
Labels of means from variational approximations? Questions	2	645	November 28, 2020
Quality of life improvements to ADVI Development	7	575	September 6, 2022
Intro Bayesian Regression using HMC & ADVI Sharing	4	1556	January 18, 2019

Fast approximation for inference on hierarchical models?

Related topics