Hierarchical Modeling Issue: Conflicting Dimension Sizes in Group-Dependent Hyperprior

Context:

I am building a basic hierarchical model to recover the true value of a parameter where entities are tested by “series” (batches). All entities in the same series are offset by the same amount. A common standard entity is present in all batches, which can be used to recover the true value (mu and sigma). This information can infer the series offset.

I initially created a hierarchical model that samples fine and tested it with simulated data.

Initial Model Code:

cpd_idx, cpd_id = pd.factorize(sim_ED50_data.compound_id)
test_plan_idx, test_plan_id  = pd.factorize(sim_ED50_data.series_id)
obs_id = np.unique(sim_ED50_data.index)
obs_idx = obs_id
moa_idx, moa_id = pd.factorize(sim_ED50_data.moa_name)
coords_model = {"cpd_id": cpd_id,
                   "test_plan_id": test_plan_id,
                   "obs_id": obs_id}

with pm.Model(coords=coords_model) as test_plan_var_fit_v4:
    # hyperprios
    sigma_hyperprior = pm.Gamma("sigma_hyperprior", alpha=5, beta=10)
    logED50_true_mu = pm.Normal("logED50_true_mu", mu=2.5, sigma=0.2)
    logED50_true_sigma = pm.Gamma("logED50_true_sigma", alpha=8, beta=10)
    
    # nor hiearchical priors
    series_offset = pm.Normal("series_offset", mu=0, sigma=0.5, dims=["test_plan_id"])
    
    # non-centered hierarchical prior
    logED50_true_z = pm.Normal("logED50_true_z", mu=0, sigma=1, dims="cpd_id")
    logED50_true = pm.Deterministic("logED50_true", logED50_true_mu + logED50_true_sigma * logED50_true_z, dims="cpd_id")

    # linear regression
    mu = pm.Deterministic("mu", series_offset[test_plan_idx] + logED50_true[cpd_idx], dims="obs_id" )
    
    # likelihood
    logED50_obs = pm.Normal("logED50_obs", mu=mu, sigma=sigma_hyperprior, observed=sim_ED50_data.test_ec50, dims="obs_id")
    
pm.model_to_graphviz(test_plan_var_fit_v4)

Initial Model Diagram:

Problem:

I want to change the hyperprior definition to introduce grouping of compound information for better estimates.

Updated Model Code:

coords_model_v5 = {"cpd_id": cpd_id,
                   "test_plan_id": test_plan_id,
                   "moa_id": moa_id,
                   "obs_id": obs_id}

with pm.Model(coords=coords_model_v5) as test_plan_var_fit_v5:
    # hyperprios
    sigma_hyperprior = pm.Gamma("sigma_hyperprior", alpha=2, beta=5)
    
    # MoA hyperpriors
    logED50_true_mu = pm.Normal("logED50_true_mu", mu=2.5, sigma=0.2, dims="moa_id")
    logED50_true_sigma = pm.Gamma("logED50_true_sigma",  alpha=2, beta=5, dims="moa_id")
    
    # non-hierarchical prior
    series_offset = pm.Normal("series_offset", mu=0, sigma=0.5, dims="test_plan_id")

    logED50_true_z = pm.Normal("logED50_true_z", mu=0, sigma=1, dims="cpd_id")
    logED50_true = pm.Deterministic("logED50_true", logED50_true_mu[moa_idx] + logED50_true_sigma[moa_idx] * logED50_true_z[cpd_idx], dims="cpd_id")

    # linear
    mu = pm.Deterministic("mu", series_offset[test_plan_idx] + logED50_true[cpd_idx], dims="obs_id" )
    
    # likelihood
    logED50_obs = pm.Normal("logED50_obs", mu=mu, sigma=sigma_hyperprior, observed=sim_ED50_data.test_ec50, dims="obs_id")
    
pm.model_to_graphviz(test_plan_var_fit_v5)

ppc_test_plan_var_fit_v5 = pm.sample_prior_predictive(samples=1,
                                             model=test_plan_var_fit_v5,
                                             random_seed=None,
                                             var_names=["logED50_true_sigma", "logED50_true_mu", "series_offset", "sigma_hyperprior", "mu"],
                                                      )
with test_plan_var_fit_v5:
    idata_test_plan_var_fit_v5 = pm.sample(tune=4000,
                              draws=1000,
                              chains=4,
                              target_accept=0.9,
                              random_seed=None,
                              nuts_sampler='numpyro', 
                              return_inferencedata=True,
                              idata_kwargs=dict(log_likelihood = False)


Updated Model Diagram:

The prior predictive check runs fine. Sampling seems to run well, but before the inference data is returned, I encounter this error:

ValueError: conflicting sizes for dimension 'cpd_id': length 175 on the data but length 141 on coordinate 'cpd_id'

I’ve tried various solutions without success. Any guidance on resolving this error would be greatly appreciated.

Additionally, if you have suggestions on the modeling approach itself, I welcome your input.

Thank you in advance for your help!