Mixed effects model - Posterior distribution of random effects on hold-out individuals

exhumea · December 21, 2020, 12:25pm

Hello there,

Here is my setup: I have some repeated measurements on individuals (a set of n_i visits at ages t_ij for individual i, each measuring an univariate continuous outcome y_ij).
I want to model outcome with a mixed effects model using a fully Bayesian approach. I start by a simple LMM: fixed and random intercepts and slopes with respect to individuals’ ages.

I split my dataset into 3 parts:

a train part: all data of 80% of individuals
on remaining 20% of individuals
- a “test known” part: 2/3 of visits of the remaining 20% of individuals
- a “test to predict” part: the remaining 1/3 of visits from these same individuals

I’d like to:

train my model (on train part) so to estimate posterior distribution of fixed effects only
then estimate posterior distribution of random effects on hold-out individuals, given data from the “test known” part and train data (marginalization on fixed effects according to fixed during train posterior distribution of fixed effects)
finally estimate whatever model metric on posterior predictive distribution of y_ijon “test to predict” part.

I understand that I could directly train my model with both train & “test known” data (i.e. skipped step 2. - and it worked) but I don’t want to do it by design.

Before posting, I read these 2 posts which have similar questions:

Predict on unseen group in hierarchical model
How do we predict on new unseen groups in a hierarchical model in PyMC3?
To my understanding they differ from my question as in case of unknown “group” (individual) they’d like to estimate posterior distribution for the population level of the hierarchy which I don’t want: I want to estimate hold-out random effects, fixed effects being locked, as we could do in a point estimates approach.

Here’s my code:

def model_factory(df):

    # <!> update levels (otherwise all levels from main df are kept)
    df_ix = df.index.remove_unused_levels()
    
    coords={"obs_id": np.arange(len(df)),
            "individuals": df_ix.levels[0]}

    with pm.Model(coords=coords) as lmm:

        # Data from outside world (to be seen in graphical model)
        inds_ix_v = pm.Data('individual_i', df_ix.codes[0], dims='obs_id')
        t_ij_v = pm.Data('age_ij', df['T'], dims='obs_id')

        # Fixed effects (model parameters)
        a0 = pm.Normal('a0', mu=mu_a0, sigma=std_a0)
        b0 = pm.Normal('b0', mu=mu_b0, sigma=std_b0)

        precision = pm.Gamma('tau', mu=mu_precision, sigma=var_precision**.5)

        precision_a = pm.Gamma('tau_a', mu=mu_precision_a, sigma=var_precision_a**.5)
        precision_b = pm.Gamma('tau_b', mu=mu_precision_b, sigma=var_precision_b**.5)

        # Random effects (individual parameters), orthogonal

        # Non-centered model: https://twiecki.io/blog/2017/02/08/bayesian-hierchical-non-centered/
        a_offset = pm.Normal('ai_offset', mu=0, tau=1, dims="individuals")
        a = pm.Deterministic('a_i', a0 + a_offset/precision_a**.5, dims="individuals")

        b_offset = pm.Normal('bi_offset', mu=0, tau=1, dims="individuals")
        b = pm.Deterministic('b_i', b0 + b_offset/precision_b**.5, dims="individuals")

        # Model (likelihood) (a_i _||_ b_i _||_ e_ij)

        y_hat_ij = a[inds_ix_v] + b[inds_ix_v] * t_ij_v

        y = pm.Normal('y', mu=y_hat_ij, tau=precision, observed=df['Y'], dims="obs_id")

    return lmm

lmm_train = model_factory( df.xs(True, level='TRAIN') )
with lmm_train:
    train_trace = pm.sample(4000, tune=2000, # burn-in 2k iterations + 4k samples
                            random_seed=seed, return_inferencedata=False)

Then I tried this for step 2 but visibly it does not work (not taking into account test known data but only being near fixed effects mean):

train_trace_wo_ranef = copy.deepcopy(train_trace)

train_trace_wo_ranef.remove_values('ai_offset')
train_trace_wo_ranef.remove_values('bi_offset')
train_trace_wo_ranef.remove_values('a_i')
train_trace_wo_ranef.remove_values('b_i')

lmm_test_known = model_factory( df.xs(True, level='TEST_KNOWN') )

with lmm_test_known:
    
    #test_trace = pm.sample(trace=train_trace)
    
    post_samples = pm.sample_posterior_predictive(
        train_trace_wo_ranef, random_seed=seed, 
        var_names=["ai_offset", "bi_offset", "a_i", "b_i"]
    )

exhumea · March 6, 2021, 10:29am

In a word, what I’d like to do is:

Estimate posterior distribution of fixed (and random) effects in a standard mixed effect model → OK
Only keep the posterior distribution of fixed effects of my model (i.e. consider them as learnt once for all)
Estimate posterior distribution of (“new”) random effects (= on never seen data, of never seen groups), having the distribution of fixed effects as an input

In a linear mixed effects, with ML point estimation, this can be done easily (there’s even a closed formula for part 3, i.e. estimation of “new” random effects, fixed effects being kept fixed), but here in a Bayesian setting with pymc3 I can’t figure out how I should proceed.

I feel my question is close to the one asked in this other post Predicting on hold-out data for different groups in a hierarchical model that was also left without any answer.

Any thoughts / help on this ? I’d really be grateful, thanks !

dilsher_dhillon · March 7, 2021, 5:03pm

I haven’t done this, but potentially you could use the method mcelreath uses in chapter 13 of Statistical rethinking to predict on new clusters. Essentially taking the linear predictor for the fixed effects and adding random draws from the random effects based off their posterior distribution estimates.

exhumea · March 7, 2021, 9:58pm

Thank you! I’ll have a look at this reference and will keep you up date!

chartl · March 9, 2021, 2:52am

The only ways I can really think about doing this are hacks. For instance using marginal Interpolated distributions to “freeze” the posterior distribution:

https://docs.pymc.io/notebooks/updating_priors.html

or, duplicating the “new” data several times, and pairing each copy with a random sample from the fixed effect posterior.

Another possibility would be to add a Freeze() transformation (Transformations of a random variable from one space to another. — PyMC3 3.10.0 documentation) that sets the Jacobian to 0 preventing variable updates. (Or infinity so the inverse Jacobian is zero – you get the idea.) Then you could have something like

a_train = pm.Deterministic('a_i_tr', a_0 + train_offset/precision_a ** .5)
a_test = pm.Deterministic('a_i_te', Freeze(a_0) + test_offset/precision_a ** .5)
[...]  # use a_train to define y_hat_ij_train; a_test to define y_hat_ij_test
y_tr = pm.Normal('y_train', mu=y_hat_ij_train, tau=precision, observed=train_df['Y'], dims="obs_id")
y_te = pm.Normal('y_test', mu=y_hat_ij_test, tau=precision, observed=test_df['Y'], dims="obs_id")

exhumea · March 14, 2021, 8:30pm

Thank you @chartl for these ideas! I’ll try to implement one of these to see what’s the outcome!

@dilsher_dhillon after having a look at the reference, it wasn’t of help because, as far as I understood, the author solely propose to estimate variability around fixed effects (indeed by adding random draws from random effects posterior distribution) for new clusters, having no actual data for these new clusters - which is not my case here (I have some new data for these new clusters, so I should go deeper than that!)

Topic		Replies	Views
Blog: The right way to do predictive checks with observation-level random effects Sharing modeling , prediction , model-checking	2	49	November 21, 2024
Simulating subgroup selection	0	192	September 14, 2023
Out of sample/model predictions in hierarchical model with new observed data v5 modeling	9	231	October 16, 2024
[Conceptual] Bayesian Chi-Squared estimation Questions	6	2013	May 25, 2019
How to make out-of-sample predictions with pymc model v5	1	651	February 8, 2023

Mixed effects model - Posterior distribution of random effects on hold-out individuals

Related topics