Hi Everyone,
I’m working on a difference-in-difference analysis problem where each observation has a reported standard error and I want to use this uncertainty in my model. The diff-in-diff model is pretty standard (although I have multiple treatments), except for the error part.
I tried to follow McElreath’s book example here: statistical-rethinking-2023/Lecture 17 - Measurement and Misclassification.ipynb at main · dustinstansbury/statistical-rethinking-2023 · GitHub with the errors formulated as an additional Normal. In my model this looks like this:
mu = pm.Deterministic(
"mu",
control_intercept + <diff_in_diff_stuff>
dims=['obs__']
)
scale = pm.Exponential('scale', lam=10)
obs = pm.Normal("obs", mu = mu, sigma = scale, dims=['obs__'])
obs_star = pm.Normal('obs*', mu=obs, sigma=values_se, observed=values, dims=['obs__'])
And this does not converge, even for target_accept=0.999
throwing divergences near the 0 of sigma:
To improve convergence I reformulated the problem in a non-central way:
mu = pm.Deterministic(
"mu",
control_intercept + <diff_in_diff_stuff>,
dims=['obs__']
)
error = pm.Normal('error', 0, 1)
scale = pm.Exponential('scale', lam=10)
obs = pm.Deterministic("obs", mu+scale*error, dims=['obs__'])
# obs = pm.Normal("obs", mu = mu, sigma = scale, dims=['obs__'])
obs_star = pm.Normal('obs*', mu=obs, sigma=values_se, observed=values, dims=['obs__'])
The sampling drastically improved, although I still need pretty high values of target_accept
to get rid of random divergencies here and there.
Although I feel that the scale
is not learnt at all from my dataset:
Am I doing everything correctly here?