Hello fellow Bayesians,
I am fitting an ordered logistic model to a complex hierarchical dataset to get out variance components and ICC’s on the latent scale, which necessitates a tricky random effects structure. Let me explain the structure…
- A set of faces are rated on a Likert scale for multiple attributes (e.g. attractive, mean, approachable).
- While the same faces are rated for each attribute, a different set of raters are used for each attribute - i.e., Person A would only see the faces for attractive, Person B only for mean, and so on. There is a clear nesting here, while faces are crossed.
- Each person rates the faces twice which allows for specific variance decompositions and ICCs to be calculated. In my field its referred to as ‘taste’ contribution, and, for a single attribute, the model is of the form: rating ~ 1 + (1|face) + (1|person) + (1|face:person). The variance of those terms, combined with logistic error variance,
np.pi**2/3
gives the variance components and ICC’s through the usual formulas (e.g. divide by total variance, etc).
I’ve been wrestling this for a few weeks, and have managed to get a clean sampling of a single attribute by forcing a zero-sum constraint on all random effects - easy on the (1|face) and (1|person), but requires a bit of trickery for the (1|face:person). Below is my complete model, and a small tester dataset for just one attribute.
import pandas as pd
import pymc as pm
import pytensor.tensor as pt
import numpy as np
# Read data
ratings = pd.read_csv('https://raw.githubusercontent.com/alexjonesphd/py4psy2024/refs/heads/main/tester.csv')
ratings.head()
## Subset to a single trait
ratings = ratings.query('trait == "attractive"')
## Generate indexers from the data, and a 'telescoper' for the person:face
stim_idx, stim_label = ratings['trait:stim_id'].factorize()
user_idx, user_label = ratings['trait:user_id'].factorize()
user_stim_idx, user_stim_label = ratings['trait:user_id:stim_id'].factorize()
# To allow us some complex prior structuring we need this indexer
# it tells us for each unique face:person pairing, what person is doing the rating
within_indexer, within_label = pd.Series(user_stim_label).str.rsplit(':', n=1, expand=True)[0].factorize()
n_faces = ratings['stim_id'].nunique()
with pm.Model(coords=coords) as ordinal_model:
## Set data
Y = pm.Data('Y', ratings['rating'].to_numpy())
## Group-effects for user, stimuli, and interaction, non-centred
user_σ = pm.HalfNormal('user_σ', sigma=1)
user_α = pm.ZeroSumNormal('user_α', sigma=user_σ, dims='user')
stim_σ = pm.HalfNormal('stim_σ', sigma=1)
stim_α = pm.ZeroSumNormal('stim_α', sigma=stim_σ, dims='stim')
user_stim_σ = pm.HalfNormal('user_stim_σ', sigma=1)
user_stim_z = pm.Normal('user_stim_z', mu=0, sigma=1, dims='user:stim')
## User-stim-z now needs de-meaning within participant - is there a cleaner way of doing this?
segment_mean = pt.zeros(within_indexer[-1] + 1)[within_indexer].inc(user_stim_z) / n_faces # computes mean of each users ranef, so we subtract it
scaled_z = user_stim_z - segment_mean[within_indexer] # mean centred
user_stim_α = scaled_z * user_stim_σ
## Mean vector + grand intercept
β = pm.Normal('β', mu=0, sigma=3)
η = β + user_α[user_idx] + stim_α[stim_idx] + user_stim_α[user_stim_idx]
## Cutpoints - there are 9 ordinal values in the data, so 8 cutpoints needed
cut = pm.Deterministic('cuts',
pm.math.concatenate(
[
pm.math.ones(1) * -4, # start latent variable at -4
pm.Normal('cut',
mu=np.linspace(-3, 4, 7), # from -3 to 4 on latent scale
sigma=0.1,
transform=pm.distributions.transforms.ordered),
]
)
)
## Likelihood
pm.OrderedLogistic('observed',
eta=η,
cutpoints=cut,
compute_p=False,
observed=Y)
# ------------------------------------------------------------------
# ICCs and variance–partition coefficients
# ------------------------------------------------------------------
# SD variables defined earlier in the model
sd_face = stim_σ # (1 | stim_id) ← “face” signal
sd_bias = user_σ # (1 | user_id) ← rater bias
sd_taste = user_stim_σ # (1 | user_id:stim_id) ← taste
noise = np.pi**2/3
# --- convert to variances -----------------------------------------
vf = sd_face**2
vb = sd_bias**2
vt = sd_taste**2
# ---------- ICCs ---------------------------------------------------
test_retest = pm.Deterministic(
"icc_retest",
(vf + vb + vt) / (vf + vb + vt + noise)
)
consistency = pm.Deterministic(
"icc_consistency",
vf / (vf + vt + noise)
)
agreement = pm.Deterministic(
"icc_agreement",
vf / (vf + vb + vt + noise)
)
taste_icc = pm.Deterministic(
"icc_taste",
vt / (vt + noise)
)
# ---------- Variance–partition coefficients -----------------------
denom = vf + vb + vt + noise
vpc_face = pm.Deterministic("vpc_face", vf / denom)
vpc_bias = pm.Deterministic("vpc_bias", vb / denom)
vpc_taste = pm.Deterministic("vpc_taste", vt / denom)
I figured out the centring-within trick from reading the discourse and interrogating ChatGPT! While this samples nicely (without the trick, it did not!), I’d like to expand this into a full generative model of the entire dataset, using partial pooling to regularise the variance estimates of each attribute, face, and person, and thus partially-pooling the ICCs.
I had a go at this by including a dimension for attribute/trait in the data to get a hierarchical variance for each component, e.g., for faces:
face_tau = pm.HalfNormal('face_tau', sigma=3)
stim_σ = pm.HalfNormal('stim_σ', sigma=face_tau, dims='trait')
stim_z = pm.Normal('stim_z`, mu=0, sigma=1, dims='stim')
stim_α = stim_z * stim_σ[some_trait_indexer] # this would keep track of the rows where each stimulus is under which trait
It gets more difficult to do this for the face:person component, because in my mind its sort of a double-centring - first ensure each participants effect is centred, and then ensure an appropriate centring for each trait. When I’ve tried this I get terribly slow sampling even with nutpie
. I think the solution might be to use pytensor.scan
but I am a total novice there. I don’t fully grasp what the centring is doing in this model - it obviously centres the draws from the prior but the .inc
line etc is opaque to me.
I can proceed without a full model of the data - the full dataset is properly vast so I may not even get much from partial pooling, but it would be nice to do it! Any tips appreciated!