I am learning MCMC methods and currently playing with the following simple model. (This is just something that I isolated from a larger model as particularly problematic.)
There is a latent variable U \sim \rm N(0, s) which is observed twice with different noise: X\sim \rm N(U, \sigma_x) and Y\sim \rm N(0, \sigma_y). My goal is to find the constants s, \sigma_x and \sigma_y back from the observations X and Y. My data generating process is as follows:
import pymc as pm
import arviz as az
import numpy as np
N = 1000
u = np.random.normal(0, 20, size=N)
x = u + np.random.normal(0, 6, size=N)
y = u + np.random.normal(0, 4, size=N)
I have been using the following pymc model to figure out the constants:
with pm.Model() as model:
theta = pm.Beta('theta', 3, 3)
r = pm.HalfNormal('r', 10)
sigmax = pm.Deterministic('sigmax', r * pm.math.sin(theta * np.pi / 2))
sigmay = pm.Deterministic('sigmay', r * pm.math.cos(theta * np.pi / 2))
s = pm.HalfNormal('s', 20)
U = pm.Normal('U', mu=0, sigma=s, shape=N)
X = pm.Normal('X', mu=U, sigma=sigmax, observed=x)
Y = pm.Normal('Y', mu=U, sigma=sigmay, observed=y)
with model:
trace = pm.sample(2000, tune=2000, nuts={'target_accept':0.9})
Here’s the picture:
As you see I have reparametrized using polar coordinates in the (\sigma_x, \sigma_y)-plane but other than that this model is very straightforward. I am also using a Beta(3, 3) prior on theta to somewhat steer it into the direction of \sigma_x = \sigma_y in the absence of better evidence.
My reason for reparametrizing is half empirical. Note that \rm Var[X] = s^2 + \sigma_x^2, \rm Var[Y] = s^2 + \sigma_y^2 and \rm Var[X - Y] = \sigma_y^2 + \sigma_x^2. We can solve this for s, \sigma_x and \sigma_y so at least in principle this model should be identifiable. (Is this correct?) From the sampling distributions I see that we can determine \rm \sigma_x^2 + \sigma_y^2 with much greater accuracy than either \sigma_x or \sigma_y separately, which is why I have reparametrized in this manner.
My result looks typically like this:
The estimates themselves are reasonable and one cannot really expect more precise estimates in this case.
However my main problem is that the model really struggles to sample theta and I have a comparatively small ess. From the trace plot of theta I have the impression that the problem is “slow mixing”. Other than that there doesn’t seem to be a real problem. If I sample 10 chains for 10k samples my \hat r will improve and my posterior mean will approach that found with the naive statistical reasoning that I sketched earlier. All in all the model does exactly what I expect, it’s just weird that my ess for theta is so low compared to r and s.
So I come to look for some advice. I have reparametrized in many ways, fiddled with the settings of the nuts sampler, tried more restrictive and looser priors,… but so far to little effect. Is there any hope to overcome this issue? Anything I can investigate? Or is there some deeper reason why it is not possible?
Trace of theta:
- edit: some later observations If I remove the variable s from the equations by simply assuming it is known to the model, this will not improve the issue. However as soon as I make changes to eliminate U (which is possible when s is gone) the problem goes away. This further suggests to me that this is really a problem of sampling in high-dimensional space and not one of identifiability.