Fitting multiple independent hierarchical models at once

Hello everyone,

I’ve been trying to get comfortable around using PyMC to fit several models at once in a single step, but need a sanity check as to whether I am doing it properly as I am finding it difficult! Some background:

I have a large-ish dataset (~460_000 rows). I have ~2,200 stimuli rated for 14 different conditions by a range of different observers. Not all observers see all stimuli, and certainly not under each condition, so the non-common effects are not very structured. My ultimate aim is to compute intra-class correlations for each of the 14 conditions by estimating a random-intercepts model for each condition (e.g., rating ~ 1 + (1|observer) + (1|stimuli). Rather than just sending this through Bambi in a for loop (my current solution) I was inspired by the new book by Osvaldo, Ravin, and Junpeng, as well as this post by Ben Vincent - I hadn’t realised it was possible to fit multiple, independent models in one instantiation.

This is my attempt to have 14 different hierarchical models in one so far, using some random data mirroring the structure I have. I can sample the prior predictive and it feels OK but full sampling is extremely slow, which I know means I’ve got something wrong somewhere. The shape and dimensions of all the parameters are melting my brain! Any advice is really appreciated, even if just for learning more!

import pandas as pd
import pymc as pm

# Read data
pmd = pd.read_csv('https://osf.io/djmh3/download')

trait_index, traits = pd.factorize(pmd['trait'])
ppt_index, ppts = pd.factorize(pmd['subid'])
stim_index, stim = pd.factorize(pmd['stimuli'])

coords = {'trait': traits, 'ppts': ppts, 'stim': stim}

with pm.Model(coords=coords) as multi_model:
    
    # hyperpriors on group-specific effects
    ppt_mu = pm.Normal('ppt_μ', mu=0, sigma=10, dims='trait')
    ppt_sigma = pm.HalfNormal('ppt_sigma', 5, dims='trait')
    ppt_offset = pm.Normal('ppt_offset', mu=0, sigma=1, dims='ppts')
    ppt_intercept = pm.Deterministic('ppt_intercept', ppt_mu[trait_index] + ppt_offset[ppt_index] * ppt_sigma[trait_index])
    
    stim_mu = pm.Normal('stim_μ', mu=0, sigma=1, dims='trait')
    stim_sigma = pm.HalfNormal('stim_sigma', 5, dims='trait')
    stim_offset = pm.Normal('stim_offset', mu=0, sigma=1, dims='stim')
    stim_intercept = pm.Deterministic('stim_intercept', stim_mu[trait_index] + stim_offset[stim_index] * stim_sigma[trait_index])

    # Standard and auxiliary params
    β0 = pm.Normal('β0', mu=0, sigma=10, dims='trait')
    σ = pm.HalfNormal('σ', 5, dims='trait')

    # Add intercept terms together
    μ = β0[trait_index] + ppt_intercept + stim_intercept
    
    # Likelihood
    pm.Normal('ll', mu=μ, sigma=σ[trait_index], observed=pmd['rating'].values)
    
    # prior pc samples OK
    prior_pc = pm.sample_prior_predictive()
    
    # posterior not so much
    trace = pm.sample()

Instead of estimating 14 models each for 1 condition, why not fit everything in one model? In Bambi it is equivalent to doing rating ~ condition + (condition|observer) + (condition|stimuli)

Thanks @junpenglao, I’ll try that. Would the inclusion of the slope for condition get in the way of computing the ICC’s? I need one ICC per condition so intuitively it feels like each condition should be separate. I’ll see if I get the model described sampling though the size of the data may cause issues! Thank you!