Labeled coords and dims in hierarchical group setting with different sizes

Dear all,
I have read already this tutorial

and this Coordinates in PyMC & InferenceData Objects – Computational Behavior Lab – Experiments, decision-making, Python, math, etc. in addition to many block posts, but I just can’t wrap my head around the coords and shape parameters and what format the data needs to be in.

I have experimental data with two different groups (aka conditions) with a different amount of participants in each (~30, labled consicutively i.e. 1-60). Then I have a different amount of observations for each participant (~25).

I set up an hierarchical model: each participants parameter should be drawn from the respective group distribution, the problem is, that I get 60 parameters, i.e. parameters for condition 1 from a participant that has not even been in group 1. I also tried to lable the participants per condition starting from 1 but it didn’t work either. I also don’t want to assume some common distributions between participant 1 of condition 1 and participant 1 of condition 2. If I use the “old” shape parameter to force the dimensions that I want to have i.e. (two conditions,participants) I get broadcasting errors.

This is my code:

exp_idx, experiment = pd.factorize(df["exp_idx"], sort=True)
participant_idx, participant_codes = pd.factorize(df["participant"], sort=True)
coords = {"Experiment": [1,2], "obs_id": np.arange(df.shape[0]), "participants": participant_codes.tolist(),}
with pm.Model(coords=coords) as hier_model_repara:
    a = pm.Data("a", df.a.values, dims="obs_id")
    a_obs = pm.Data("a_obs", df.a_obs.values, dims="obs_id")
    b = pm.Data("b", df.b.values, dims="obs_id")
    b_obs = pm.Data("b_obs", df.b_obs.values, dims="obs_id")
    level = pm.MutableData('level', df.level_scaled.values, dims='obs_id')
    nr_o = pm.Data('nr_o', df.nr_o.values, dims="obs_id")
    nr = pm.Data('nr', df.nr_trials.values, dims="obs_id")   
    
    # experiment dimensions -> 2 
# 3 hyper priors for group level (adaptation, intercept, level)
    mu_adap = pm.Normal("mu_adap", 0.0, sigma=10.0, dims="Experiment")
    sigma_adap = pm.HalfCauchy('sigma_adap', 5, dims="Experiment")
    mu_i = pm.Normal('mu_i', mu=0, sigma=2, dims = "Experiment")
    sigma_i = pm.HalfCauchy('sigma_i', 5, dims="Experiment")
    mu_l = pm.Normal('mu_l', mu=0, sigma=2, dims = "Experiment")
    sigma_l = pm.HalfCauchy('sigma_l', 5, dims="Experiment")
    
    
# individual participant parameters
    adaptation_offset = pm.Normal('a_offset', mu=0, sigma=1,  dims=("participants","Experiment"))
    adaptation = pm.Deterministic("adaptation", mu_ adap + adaptation_offset * sigma_ adap)
    
    intercept_offset = pm.Normal('i_offset', mu=0, sigma=1, dims=("participants","Experiment"))
    intercept = pm.Deterministic("intercept", mu_i + intercept_offset *sigma_i)
    
    level_offset = pm.Normal('m_offset', mu=0, sigma=1, dims=("participants","Experiment"))
    levelscale = pm.Deterministic("levelscale", mu_l + level_offset *sigma_l)
           
    A_diff = a - a_obs
    B_diff = b_obs - b 
    diff = A_diff + B_diff 

    diff_adapt = diff * adaptation[participant_idx,exp_idx]

    p = pm.Deterministic('p', pm.math.sigmoid(intercept[participant_idx,exp_idx] + neural_response + level*levelscale[participant_idx,exp_idx]), dims='obs_id')
    n_os = pm.Binomial('respone',n=nr, p=p, observed = nr_o, dims='obs_id')

In addition to that, the hierarchical model does perform worse (tested with loo) than the simple pooled model, is that even possible? After all I read, everyone pleads to use a hierarchical model as it should explain more variance and also fits the actual design better. What am I doing wrong?
Help is highly appreciated!

1 Like

Just looking through your model quickly, it looks like you have have a separate adaptation parameter for every combination of participant and experiment (i.e., dims=("participants","Experiment")). So 2 per participant. I assume this is not what you want. I would assume you want 1 per participant, but have these participant-level parameters drawn from separate hyper parameters. But you have a single list of participant IDs and no way to figure out which condition/group/Experiment each is associated with. So you can do something like this?

adaptation_offset = pm.Normal('a_offset',
                              mu=0,
                              sigma=1,
                              dims="participants"
)
adaptation = pm.Deterministic("adaptation",
                              mu_adap[exp_idx] +
                              (adaptation_offset * sigma_adap[exp_idx])
)

If I didn’t screw that up, it should create 1 adaptation_offset per participant, and then creates the per-participant adaptation by selecting the appropriate, experiment-specific value of mu_adap and sigma_adap. Similar sorts of things can be done for the other parameters.

Thank you for your response!

I tried this already before but when I adjust the code after the lines you posted like this:

diff_adapt = pm.Deterministic("adaptation_diff", adaptation_diff * adaptation[exp_idx])

    p = pm.Deterministic('p', pm.math.sigmoid(intercept[exp_idx] + diff_adapt + level*levelscale[exp_idx]))
    n_os = pm.Binomial('respone',n=nr, p=p, observed = nr_o, dims='obs_id')

the model compiles but when sampling I get the ValueError: Input dimension mismatch. One other input has shape[0] = 1650, but input[1].shape[0] = 60. (I have 1650 observations and 60 participants).
I also tried with different versions of adaptation[exp_idx] without any index and with adaptation[exp_idx,participant] but doesn’t work either.
If I don’t specify the dims of the code that follows at all I get the AssertionError: Could not broadcast dimensions even before sampling.

I’m not sure what adaptation_diff is as it wasn’t in your original model and I’m not sure what line is throwing the error without a more complete error message/traceback. Can you provide the model you are now running?

I just re-wrote this line in order that I can see the adaptation-difference values after sampling from the model. Everything else stayed the same except that I removed “experiment” as a dim for the offset parameters and adjusted the mus/sigmas with the index - as you proposed.

So when I specify the parameters intercept, adaptation and level to be participant specific, I would assume I also need to index this when using the varaiables later in the model to calculate another parameter with it, don’t I?

That line is going to break because this line

adaptation = pm.Deterministic("adaptation",
                              mu_adap[exp_idx] +
                              (adaptation_offset * sigma_adap[exp_idx])
)

creates 1 value of adaptation per participant. Asking for adaptation[participant_idx,exp_idx] still assumes you have 2 values per participant.

I agree to what you say, but also with diff_adapt = diff * adaptation[exp_idx] I get the error about non-matching shapes.
(of course also changing [exp_idx] for intercept and level)

It’s difficult to do much without example code and/or full error messages.

Ok sorry, I didn’t want to spam. Here is everything:

The coords:

participant_idx,participant_codes = pd.factorize(df["participant"])
exp_idx,exp_codes = pd.factorize(df["exp_idx"])
coords = {'participants': participant_codes.tolist(),
          'experiment': [1,2],
          'obs_id': np.arange(df['nr_o'].shape[0])
         }

The model:

with pm.Model(coords=coords) as hier_model_repara:
    a= pm.Data("a", df.a.values, dims="obs_id")
    a_obs= pm.Data("stimulus_f0", df.a_obs.values, dims="obs_id")
    b= pm.Data("b", df.b.values, dims="obs_id")
    b_obs = pm.Data("b_obs", df.b_obs.values, dims="obs_id")
    level= pm.MutableData('level', df.level_scaled.values, dims='obs_id')
    nr_o = pm.Data('nr_o', df.nr_o.values, dims="obs_id")
    nr = pm.Data('nr', df.nr_trials.values, dims="obs_id")   
    
    # experiment dimensions -> 2 
    mu_a = pm.Normal("mu_a", 0.0, sigma=10.0, dims="experiment")
    sigma_a = pm.HalfCauchy('sigma_a', 5, dims="experiment")
    mu_i = pm.Normal('mu_i', mu=0, sigma=2, dims = "experiment")
    sigma_i = pm.HalfCauchy('sigma_i', 5, dims="experiment")
    mu_l = pm.Normal('mu_l', mu=0, sigma=2, dims = "experiment")
    sigma_l = pm.HalfCauchy('sigma_l', 5, dims="experiment")
    
    
    adaptation_offset = pm.Normal('a_offset', 
                                  mu=0, sigma=1,  
                                  dims=("participants"))
    adaptation = pm.Deterministic("adaptation", 
                                  mu_a[exp_idx] + (adaptation_offset * sigma_a[exp_idx]))
    
    intercept_offset = pm.Normal('i_offset', 
                                 mu=0, sigma=1, 
                                 dims=("participants"))
    intercept = pm.Deterministic("intercept", 
                                 mu_i[exp_idx] + (intercept_offset *sigma_i[exp_idx]))
    
    level_offset = pm.Normal('l_offset', 
                             mu=0, 
                             sigma=1, 
                             dims=("participants"))
    levelscale = pm.Deterministic("levelscale", 
                                  mu_l[exp_idx] + (level_offset *sigma_l[exp_idx]))
           
    a_diff = a - a_obs
    b_diff = b_obs - b
adaptation_diff = a_diff + b_diff 

    diff_adapt = pm.Deterministic("adaptation_diff", adaptation_diff * adaptation[exp_idx])

    p = pm.Deterministic('p', pm.math.sigmoid(intercept[exp_idx] + diff_adapt  + level*levelscale[exp_idx]))
    n_os = pm.Binomial('respone',n=nr, p=p, observed = nr_o, dims='obs_id')

The error:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/function/types.py:975, in Function.__call__(self, *args, **kwargs)
    973 try:
    974     outputs = (
--> 975         self.vm()
    976         if output_subset is None
    977         else self.vm(output_subset=output_subset)
    978     )
    979 except Exception:

ValueError: Input dimension mismatch. One other input has shape[0] = 1650, but input[1].shape[0] = 60.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [645], in <cell line: 1>()
      1 with pooled_model_experiment_repara:
----> 2     adaptation_pooled_model_experiment_repara = pm.sample(draws=4000, chains=4, tune=1000, cores=16, return_inferencedata=True)

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/sampling.py:540, in sample(draws, step, init, n_init, initvals, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, **kwargs)
    538         [kwargs.setdefault(k, v) for k, v in nuts_kwargs.items()]
    539     _log.info("Auto-assigning NUTS sampler...")
--> 540     initial_points, step = init_nuts(
    541         init=init,
    542         chains=chains,
    543         n_init=n_init,
    544         model=model,
    545         random_seed=random_seed_list,
    546         progressbar=progressbar,
    547         jitter_max_retries=jitter_max_retries,
    548         tune=tune,
    549         initvals=initvals,
    550         **kwargs,
    551     )
    553 if initial_points is None:
    554     # Time to draw/evaluate numeric start points for each chain.
    555     ipfns = make_initial_point_fns_per_chain(
    556         model=model,
    557         overrides=initvals,
    558         jitter_rvs=filter_rvs_to_jitter(step),
    559         chains=chains,
    560     )

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/sampling.py:2494, in init_nuts(init, chains, n_init, model, random_seed, progressbar, jitter_max_retries, tune, initvals, **kwargs)
   2487 _log.info(f"Initializing NUTS using {init}...")
   2489 cb = [
   2490     pm.callbacks.CheckParametersConvergence(tolerance=1e-2, diff="absolute"),
   2491     pm.callbacks.CheckParametersConvergence(tolerance=1e-2, diff="relative"),
 2492 ]
-> 2494 initial_points = _init_jitter(
   2495     model,
   2496     initvals,
   2497     seeds=random_seed_list,
   2498     jitter="jitter" in init,
   2499     jitter_max_retries=jitter_max_retries,
   2500 )
   2502 apoints = [DictToArrayBijection.map(point) for point in initial_points]
   2503 apoints_data = [apoint.data for apoint in apoints]

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/sampling.py:2388, in _init_jitter(model, initvals, seeds, jitter, jitter_max_retries)
   2386 if i < jitter_max_retries:
   2387     try:
-> 2388         model.check_start_vals(point)
   2389     except SamplingError:
   2390         # Retry with a new seed
   2391         seed = rng.randint(2**30, dtype=np.int64)

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/model.py:1791, in Model.check_start_vals(self, start)
   1785     valid_keys = ", ".join(self.named_vars.keys())
   1786     raise KeyError(
   1787         "Some start parameters do not appear in the model!\n"
   1788         f"Valid keys are: {valid_keys}, but {extra_keys} was supplied"
   1789     )
-> 1791 initial_eval = self.point_logps(point=elem)
   1793 if not all(np.isfinite(v) for v in initial_eval.values()):
   1794     raise SamplingError(
   1795         "Initial evaluation of model at starting point failed!\n"
   1796         f"Starting values:\n{elem}\n\n"
   1797         f"Initial evaluation results:\n{initial_eval}"
   1798     )

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/model.py:1832, in Model.point_logps(self, point, round_vals)
   1826 factors = self.basic_RVs + self.potentials
   1827 factor_logps_fn = [at.sum(factor) for factor in self.logp(factors, sum=False)]
   1828 return {
   1829     factor.name: np.round(np.asarray(factor_logp), round_vals)
   1830     for factor, factor_logp in zip(
   1831         factors,
-> 1832         self.compile_fn(factor_logps_fn)(point),
   1833     )
   1834 }

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/pymc/aesaraf.py:695, in PointFunc.__call__(self, state)
    694 def __call__(self, state):
--> 695     return self.f(**state)

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/function/types.py:988, in Function.__call__(self, *args, **kwargs)
    986     if hasattr(self.vm, "thunks"):
    987         thunk = self.vm.thunks[self.vm.position_of_error]
--> 988     raise_with_op(
    989         self.maker.fgraph,
    990         node=self.vm.nodes[self.vm.position_of_error],
    991         thunk=thunk,
    992         storage_map=getattr(self.vm, "storage_map", None),
    993     )
    994 else:
    995     # old-style linkers raise their own exceptions
    996     raise

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/link/utils.py:534, in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
    529     warnings.warn(
    530         f"{exc_type} error does not allow us to add an extra error message"
    531     )
    532     # Some exception need extra parameter in inputs. So forget the
    533     # extra long error message in that case.
--> 534 raise exc_value.with_traceback(exc_trace)

File /projects/crunchie/miniconda3/envs/pymc_env/lib/python3.10/site-packages/aesara/compile/function/types.py:975, in Function.__call__(self, *args, **kwargs)
    972 t0_fn = time.time()
    973 try:
    974     outputs = (
--> 975         self.vm()
    976         if output_subset is None
    977         else self.vm(output_subset=output_subset)
    978     )
    979 except Exception:
    980     restore_defaults()

ValueError: Input dimension mismatch. One other input has shape[0] = 1650, but input[1].shape[0] = 60.
Apply node that caused the error: Elemwise{Composite{(i0 + (i1 * i2))}}[(0, 0)](AdvancedSubtensor1.0, i_offset, AdvancedSubtensor1.0)
Toposort index: 29
Inputs types: [TensorType(float64, (None,)), TensorType(float64, (None,)), TensorType(float64, (None,))]
Inputs shapes: [(1650,), (60,), (1650,)]
Inputs strides: [(8,), (8,), (8,)]
Inputs values: ['not shown', 'not shown', 'not shown']
Outputs clients: [[AdvancedSubtensor1(Elemwise{Composite{(i0 + (i1 * i2))}}[(0, 0)].0, TensorConstant{[0 0 0 ... 1 1 1]})]]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Your code runs for me, but I had to assume what your data looks like. So maybe provide the output of df.head()?

exp_idx participant a b a_obs b_obs level_scaled nr_o nr_trials
0 1 1 10.575617 1.609765 10.575617 1.609765 -0.5 6 35
1 1 1 10.575617 1.609765 10.575617 1.609765 0.0 18 36
2 1 1 10.575617 1.609765 10.575617 1.609765 0.5 25 36
3 1 1 10.575617 2.073440 10.575617 1.609765 -1.0 3 36
4 1 1 10.575617 2.073440 10.575617 1.609765 -0.5 1 36

Ah, I failed to adapt the per-participant offset. Try this?

    adaptation_offset = pm.Normal('a_offset', 
                                  mu=0, sigma=1,  
                                  dims=("participants"))
    adaptation = pm.Deterministic("adaptation", 
                                  mu_a[exp_idx] + (adaptation_offset[participant_idx] * sigma_a[exp_idx])
                                  )

So now there is a mu_a per experiment, a sigma_a per experiment, and an offset per participant. Each gets selected according to the appropriate index. I think.

Thank you! It samples now…
I think that’s exactly connected to what I don’t understand about the coords / dimensions: why don’t I have to index adaptation (and the other participant specific parameters) in the code that follows with both indices like this

diff_adapt = pm.Deterministic("adaptation_diff", adaptation_diff * adaptation[participant_idx,exp_idx])

Now checking the results and for example if I select one participant who I know only belongs to experiment group 1 e.g.

coords = {"experiment": [2],"participants": [1]};
az.plot_posterior(trace, coords=coords)

there should be no posteriors because participant 1 has no observations for experiment 2, but strangely there are? So I assume it is not working correctly…?

Similarly:

coords = {"participants": [1]};
az.plot_posterior(trace, coords=coords);

outputs two parameters for each experiment’s mu_a, mu_i, mu_l and their sigmas. Why is that when participant 1 did not even have values for experiment 2?

Hi all,

I am facing a similar issue. I have very similar data with 77 subjects, 6 conditions per subjects, and 24 trials per condition with bernoulli responses (0s and 1s). My model is very simple with basically a linear equation + softmax to convert the output of the equation to Bernouli probability. Obviously, eventually, I want to build a hierarchical prior because the conditions are within-subject so is one hierarchy lower. But right now I’m just writing all parameters as one matrix with dimension (ncond x nsubj) i.e. (6 x 77). However, I get an error of dimension mismatch. Can anyone help me with this, please? I think I wrote it straight by the template.

subj_id_idxs, subj_id_unique = pd.factorize(data.id, sort=True)
condition_idxs, condition_unique = pd.factorize(data.conditions, sort=True)
coords = {
"subj_id_unique": subj_id_unique,
"obs_id": np.arange(data.shape[0]),
"conditions": condition_unique,
}
with pm.Model(coords=coords) as m:
    subj_id_idxs = pm.ConstantData("subj_id_idxs", subj_id_idxs, dims="obs_id")
    condition_idxs = pm.ConstantData("condition_idxs", condition_idxs, dims="obs_id")
    block_trial_num = pm.ConstantData("block_trial_num", data.block_trial_num.values, dims="obs_id")

    R0 = pm.ConstantData('mean_prior_reward_0', data.mean_prior_reward_0.values, dims="obs_id")
    R1 = pm.ConstantData('mean_prior_reward_1',data.mean_prior_reward_1.values, dims="obs_id")
    I0 = pm.ConstantData('information_0',data.information_0.values, dims="obs_id")
    I1 = pm.ConstantData('information_1',data.information_1.values, dims="obs_id")
    s0 = pm.ConstantData('location_0',data.location_0.values, dims="obs_id")
    s1 = pm.ConstantData('location_1',data.location_1.values, dims="obs_id")
    
    choice = pm.ConstantData('choice',data.first_free_choice.values, dims="obs_id")
    
    inv_sig = pm.HalfNormal(name="beta", sigma=10, dims=('conditions', 'subj_id_unique')) #typical softmax temperature prior
    alpha = pm.Normal('alpha', mu=0, sigma=10, dims=('conditions', 'subj_id_unique')) #typical linear regression weight prior.
    B = pm.Normal('B', mu=0, sigma=10, dims=('conditions', 'subj_id_unique'))
    
    v1 = pm.Deterministic('v1',R0-R1 + alpha*(I0-I1) + B*(s0-s1))
    # print(pm.draw(I0-I1).shape)
    p1 = pm.Deterministic('p1', pm.math.sigmoid(-v1*inv_sig))

    obs = pm.Bernoulli("obs",p=p1, observed=choice, dims='obs_id')

I think I fixed it by adding: [condition_idxs, subj_id_idxs] after the parameters! But somehow this is taking a really long time to fit. Because all of a sudden the progress bar doesn’t display anymore, I don’t have a time estimate but it has already taken over 4 hours. Is this natural for a model as simple as this?

In cases like this, I would strongly suggest you start by generating some synthetic data so that you know that your model mirrors the generating process and know the true values of each parameter. Then you can start with a small data set and see how things work out before tackling larger, more realistic (but still synthetic) data sets. Then you should have a bit more confidence when you finally turn to your real data.