Reparameterizing in what sense? Forgive my ignorance.
Re parametrizing means changing your model basically. Some times you get divergences because the choice of priors are not suitable for the model etc. I played around with the priors a bit and could bring it down to a level where divergences are not happening at all or are very few (1 in 20000 in one instance, 24 in another) even when I remove the target accept. So perhaps a drastic change to the model is not necessary. In the previous code I posted, I did forget to include the data generation step so here is the final model I have used (with also somewhat slightly changed variable names so I can keep them in my head easier):
import numpy as np
import pymc as pm
import arviz as az
seed=5
rng = np.random.default_rng(seed)
cov_sigma_real = 1
cov_mu_real = 0.05
group_CoV_real = rng.lognormal(np.log(cov_mu_real), cov_sigma_real, size=4)
samples = 50
means_real = [20, 150, 78, 65]
group_vals = np.zeros((samples, len(means_real)))
group_labels = np.zeros((samples, len(means_real)), dtype=int)
for i in range(4):
group_vals[:, i] = np.random.normal(means_real[i], means_real[i] * group_CoV_real[i],
size=samples)
group_labels[:,i] = int(i)
group_idx = group_labels.T.flatten()
val = group_vals.T.flatten()
coords = {"groups": [0,1,2,3],
"obs":range(len(group_idx))}
with pm.Model(coords=coords) as HCoV_model:
cov_sigma = pm.HalfNormal("cov_sigma", 1)
cov_mu = pm.HalfNormal("global_CoV", 0.1)
group_CoV = pm.LogNormal("group_CoV", mu=np.log(cov_mu),
sigma=cov_sigma, dims="groups")
group_mean = pm.LogNormal("group_mean", mu=np.log(80), sigma=1,
dims="groups")
sigma = pm.Deterministic("sigma", group_mean*group_CoV)
y_obs = group_mean[group_idx]
sigma_obs = sigma[group_idx]
y = pm.Normal("y", mu=y_obs, sigma=sigma_obs, observed=val, dims="obs")
trace = pm.sample(tune=5000, draws=5000, random_seed=rng)
with HCoV_model:
ppc = pm.sample_posterior_predictive(trace)
az.plot_ppc(ppc)
ax = az.plot_posterior(trace, var_names = ["cov_sigma", "global_CoV",
"group_CoV", "group_mean"])
ax = ax.flatten()
ax[0].scatter(cov_sigma_real, 0, marker='o', color='red')
ax[1].scatter(cov_mu_real, 0, marker='o', color='red')
for i in range(4):
ax[i+2].scatter(group_CoV_real[i], 0, marker='o', color='red')
ax[i+6].scatter(means_real[i], 0, marker='o', color='red')
ax[0].get_figure().tight_layout()
posterior variance would be in some sense double counting mean variance.
So if both group mean and sd is involved in calculation observation sigma, then obviously this creates more wiggle room for these parameters to change: by scaling one down and the other up you get the same product, so an identifiability problem. However, there are other places where the scale of group means gets fixed probably remedying the identifiability problem: for instance it appears as the mean of your likelihood, therefore any attempt to scale it down or up from its expected value would also effect here that decreases the likelihood. So as long as group mean is identified reasonably certainly by being the mean of your likelihood, I think sd would be fine too. Consider for instance trying to solve the following set of equations (where x0 and x1 are given a and b are to be determined):
a*b = 10
x0 < a < x1
the more precise the second condition is the more precise your b is going to be. If you look at the posteriors, group means are identified quite certainly, so I wouldn’t worry too much about it.
I’m not sure I understand this correction, could you expand on this a little?
Simply using
y = pm.LogNormal("y", mu=pm.math.log(y_obs), sigma=group_CoV[group_idx],
observed=val, dims="obs")
for your likelihood also seems to give reasonable results the full model being:
with pm.Model(coords=coords) as HCoV_model:
cov_sigma = pm.HalfNormal("cov_sigma", 1)
cov_mu = pm.HalfNormal("global_CoV", 0.1)
group_CoV = pm.LogNormal("group_CoV", mu=np.log(cov_mu),
sigma=cov_sigma, dims="groups")
group_mean = pm.LogNormal("group_mean", mu=np.log(80), sigma=1,
dims="groups")
y_obs = group_mean[group_idx]
y = pm.LogNormal("y", mu=pm.math.log(y_obs), sigma=group_CoV[group_idx],
observed=val, dims="obs")
trace = pm.sample(tune=5000, draws=5000, random_seed=rng)
So this might be worth a shot too. Though this assumes now your observation noise is lognormally distributed as LogNormal(log(mu), sd) which is obviously different from Normal(mu, sd*mu). However in the end this might fit your data better who knows. The LogNormal as it is has mean exp(log(mu) + sd) ~ mu (sd is small in your case compared to mu) and variance (exp(sigma^2) - 1)exp(2log(mu) + sd) ~ sd^2 mu^2. So it serves your purposes well.
I have not tried scaling the observed data (and then inverse scaling your parameters in the end) but would be curious to know if that works too.