Context:
I am building a basic hierarchical model to recover the true value of a parameter where entities are tested by “series” (batches). All entities in the same series are offset by the same amount. A common standard entity is present in all batches, which can be used to recover the true value (mu and sigma). This information can infer the series offset.
I initially created a hierarchical model that samples fine and tested it with simulated data.
Initial Model Code:
cpd_idx, cpd_id = pd.factorize(sim_ED50_data.compound_id)
test_plan_idx, test_plan_id = pd.factorize(sim_ED50_data.series_id)
obs_id = np.unique(sim_ED50_data.index)
obs_idx = obs_id
moa_idx, moa_id = pd.factorize(sim_ED50_data.moa_name)
coords_model = {"cpd_id": cpd_id,
"test_plan_id": test_plan_id,
"obs_id": obs_id}
with pm.Model(coords=coords_model) as test_plan_var_fit_v4:
# hyperprios
sigma_hyperprior = pm.Gamma("sigma_hyperprior", alpha=5, beta=10)
logED50_true_mu = pm.Normal("logED50_true_mu", mu=2.5, sigma=0.2)
logED50_true_sigma = pm.Gamma("logED50_true_sigma", alpha=8, beta=10)
# nor hiearchical priors
series_offset = pm.Normal("series_offset", mu=0, sigma=0.5, dims=["test_plan_id"])
# non-centered hierarchical prior
logED50_true_z = pm.Normal("logED50_true_z", mu=0, sigma=1, dims="cpd_id")
logED50_true = pm.Deterministic("logED50_true", logED50_true_mu + logED50_true_sigma * logED50_true_z, dims="cpd_id")
# linear regression
mu = pm.Deterministic("mu", series_offset[test_plan_idx] + logED50_true[cpd_idx], dims="obs_id" )
# likelihood
logED50_obs = pm.Normal("logED50_obs", mu=mu, sigma=sigma_hyperprior, observed=sim_ED50_data.test_ec50, dims="obs_id")
pm.model_to_graphviz(test_plan_var_fit_v4)
Initial Model Diagram:
Problem:
I want to change the hyperprior definition to introduce grouping of compound information for better estimates.
Updated Model Code:
coords_model_v5 = {"cpd_id": cpd_id,
"test_plan_id": test_plan_id,
"moa_id": moa_id,
"obs_id": obs_id}
with pm.Model(coords=coords_model_v5) as test_plan_var_fit_v5:
# hyperprios
sigma_hyperprior = pm.Gamma("sigma_hyperprior", alpha=2, beta=5)
# MoA hyperpriors
logED50_true_mu = pm.Normal("logED50_true_mu", mu=2.5, sigma=0.2, dims="moa_id")
logED50_true_sigma = pm.Gamma("logED50_true_sigma", alpha=2, beta=5, dims="moa_id")
# non-hierarchical prior
series_offset = pm.Normal("series_offset", mu=0, sigma=0.5, dims="test_plan_id")
logED50_true_z = pm.Normal("logED50_true_z", mu=0, sigma=1, dims="cpd_id")
logED50_true = pm.Deterministic("logED50_true", logED50_true_mu[moa_idx] + logED50_true_sigma[moa_idx] * logED50_true_z[cpd_idx], dims="cpd_id")
# linear
mu = pm.Deterministic("mu", series_offset[test_plan_idx] + logED50_true[cpd_idx], dims="obs_id" )
# likelihood
logED50_obs = pm.Normal("logED50_obs", mu=mu, sigma=sigma_hyperprior, observed=sim_ED50_data.test_ec50, dims="obs_id")
pm.model_to_graphviz(test_plan_var_fit_v5)
ppc_test_plan_var_fit_v5 = pm.sample_prior_predictive(samples=1,
model=test_plan_var_fit_v5,
random_seed=None,
var_names=["logED50_true_sigma", "logED50_true_mu", "series_offset", "sigma_hyperprior", "mu"],
)
with test_plan_var_fit_v5:
idata_test_plan_var_fit_v5 = pm.sample(tune=4000,
draws=1000,
chains=4,
target_accept=0.9,
random_seed=None,
nuts_sampler='numpyro',
return_inferencedata=True,
idata_kwargs=dict(log_likelihood = False)
Updated Model Diagram:
The prior predictive check runs fine. Sampling seems to run well, but before the inference data is returned, I encounter this error:
ValueError: conflicting sizes for dimension 'cpd_id': length 175 on the data but length 141 on coordinate 'cpd_id'
I’ve tried various solutions without success. Any guidance on resolving this error would be greatly appreciated.
Additionally, if you have suggestions on the modeling approach itself, I welcome your input.
Thank you in advance for your help!