Mixed effects model - Posterior distribution of random effects on hold-out individuals

Hello there,

Here is my setup: I have some repeated measurements on individuals (a set of n_i visits at ages t_ij for individual i, each measuring an univariate continuous outcome y_ij).
I want to model outcome with a mixed effects model using a fully Bayesian approach. I start by a simple LMM: fixed and random intercepts and slopes with respect to individuals’ ages.

I split my dataset into 3 parts:

  • a train part: all data of 80% of individuals
  • on remaining 20% of individuals
    • a “test known” part: 2/3 of visits of the remaining 20% of individuals
    • a “test to predict” part: the remaining 1/3 of visits from these same individuals

I’d like to:

  1. train my model (on train part) so to estimate posterior distribution of fixed effects only
  2. then estimate posterior distribution of random effects on hold-out individuals, given data from the “test known” part and train data (marginalization on fixed effects according to fixed during train posterior distribution of fixed effects)
  3. finally estimate whatever model metric on posterior predictive distribution of y_ijon “test to predict” part.

I understand that I could directly train my model with both train & “test known” data (i.e. skipped step 2. - and it worked) but I don’t want to do it by design.

Before posting, I read these 2 posts which have similar questions:

Here’s my code:

def model_factory(df):

    # <!> update levels (otherwise all levels from main df are kept)
    df_ix = df.index.remove_unused_levels()
    coords={"obs_id": np.arange(len(df)),
            "individuals": df_ix.levels[0]}

    with pm.Model(coords=coords) as lmm:

        # Data from outside world (to be seen in graphical model)
        inds_ix_v = pm.Data('individual_i', df_ix.codes[0], dims='obs_id')
        t_ij_v = pm.Data('age_ij', df['T'], dims='obs_id')

        # Fixed effects (model parameters)
        a0 = pm.Normal('a0', mu=mu_a0, sigma=std_a0)
        b0 = pm.Normal('b0', mu=mu_b0, sigma=std_b0)

        precision = pm.Gamma('tau', mu=mu_precision, sigma=var_precision**.5)

        precision_a = pm.Gamma('tau_a', mu=mu_precision_a, sigma=var_precision_a**.5)
        precision_b = pm.Gamma('tau_b', mu=mu_precision_b, sigma=var_precision_b**.5)

        # Random effects (individual parameters), orthogonal

        # Non-centered model: https://twiecki.io/blog/2017/02/08/bayesian-hierchical-non-centered/
        a_offset = pm.Normal('ai_offset', mu=0, tau=1, dims="individuals")
        a = pm.Deterministic('a_i', a0 + a_offset/precision_a**.5, dims="individuals")

        b_offset = pm.Normal('bi_offset', mu=0, tau=1, dims="individuals")
        b = pm.Deterministic('b_i', b0 + b_offset/precision_b**.5, dims="individuals")

        # Model (likelihood) (a_i _||_ b_i _||_ e_ij)

        y_hat_ij = a[inds_ix_v] + b[inds_ix_v] * t_ij_v

        y = pm.Normal('y', mu=y_hat_ij, tau=precision, observed=df['Y'], dims="obs_id")

    return lmm
lmm_train = model_factory( df.xs(True, level='TRAIN') )
with lmm_train:
    train_trace = pm.sample(4000, tune=2000, # burn-in 2k iterations + 4k samples
                            random_seed=seed, return_inferencedata=False)

Then I tried this for step 2 but visibly it does not work (not taking into account test known data but only being near fixed effects mean):

train_trace_wo_ranef = copy.deepcopy(train_trace)


lmm_test_known = model_factory( df.xs(True, level='TEST_KNOWN') )

with lmm_test_known:
    #test_trace = pm.sample(trace=train_trace)
    post_samples = pm.sample_posterior_predictive(
        train_trace_wo_ranef, random_seed=seed, 
        var_names=["ai_offset", "bi_offset", "a_i", "b_i"]

In a word, what I’d like to do is:

  1. Estimate posterior distribution of fixed (and random) effects in a standard mixed effect model → OK
  2. Only keep the posterior distribution of fixed effects of my model (i.e. consider them as learnt once for all)
  3. Estimate posterior distribution of (“new”) random effects (= on never seen data, of never seen groups), having the distribution of fixed effects as an input

In a linear mixed effects, with ML point estimation, this can be done easily (there’s even a closed formula for part 3, i.e. estimation of “new” random effects, fixed effects being kept fixed), but here in a Bayesian setting with pymc3 I can’t figure out how I should proceed.

I feel my question is close to the one asked in this other post Predicting on hold-out data for different groups in a hierarchical model that was also left without any answer.

Any thoughts / help on this ? I’d really be grateful, thanks !

I haven’t done this, but potentially you could use the method mcelreath uses in chapter 13 of Statistical rethinking to predict on new clusters. Essentially taking the linear predictor for the fixed effects and adding random draws from the random effects based off their posterior distribution estimates.

Thank you! I’ll have a look at this reference and will keep you up date!

The only ways I can really think about doing this are hacks. For instance using marginal Interpolated distributions to “freeze” the posterior distribution:


or, duplicating the “new” data several times, and pairing each copy with a random sample from the fixed effect posterior.

Another possibility would be to add a Freeze() transformation (Transformations of a random variable from one space to another. — PyMC3 3.10.0 documentation) that sets the Jacobian to 0 preventing variable updates. (Or infinity so the inverse Jacobian is zero – you get the idea.) Then you could have something like

a_train = pm.Deterministic('a_i_tr', a_0 + train_offset/precision_a ** .5)
a_test = pm.Deterministic('a_i_te', Freeze(a_0) + test_offset/precision_a ** .5)
[...]  # use a_train to define y_hat_ij_train; a_test to define y_hat_ij_test
y_tr = pm.Normal('y_train', mu=y_hat_ij_train, tau=precision, observed=train_df['Y'], dims="obs_id")
y_te = pm.Normal('y_test', mu=y_hat_ij_test, tau=precision, observed=test_df['Y'], dims="obs_id")

Thank you @chartl for these ideas! I’ll try to implement one of these to see what’s the outcome!

@dilsher_dhillon after having a look at the reference, it wasn’t of help because, as far as I understood, the author solely propose to estimate variability around fixed effects (indeed by adding random draws from random effects posterior distribution) for new clusters, having no actual data for these new clusters - which is not my case here (I have some new data for these new clusters, so I should go deeper than that!)