Wrong posterior in multidimensional GaussianRandomWalk

Hello, I read several posts and tried different approaches before writing this post, but I have been stuck for a moment now and I cannot make a simple time series work when the variables have more than one dimension.

Below is an example of a simple HMM with a latent variable Z that evolves over time with Gaussian noise, and another variable X (observable) with respective emission noise.

T = 50  # number of time steps
F = 2  # number of features (dimension of each state/observation)

sd_z = 1  # transition noise (single noise for all features)
sd_x = 1  # emission noise (single noise for all features)

# Data generation

np.random.seed(0)
x = np.zeros((T, F))
z = np.zeros((T, F))
for t in range(T):
    if t == 0:
        z[0] = norm(loc=0, scale=1).rvs(size=F)
    else:
        z[t] = norm(loc=z[t - 1], scale=sd_z).rvs()
    x[t] = norm(loc=z[t], scale=sd_x).rvs()

# Model definition and posterior inference

coords = {"time": np.arange(T), "features": np.arange(F)}
hmm = pm.Model(coords=coords)
with hmm:
    x_obs = pm.MutableData("x_obs", x, dims=("time", "features"))

    # Standard deviation of the transition and emission distributions
    i_sd_z = pm.HalfNormal(name="i_sd_z", sigma=1)
    i_sd_x = pm.HalfNormal(name="i_sd_x", sigma=1)

    # Transition distribution of latent variable Z
    i_z = pm.GaussianRandomWalk(name="i_z", init_dist=pm.Normal.dist(mu=0, sigma=1, shape=(1, F)),
                                mu=0, sigma=i_sd_z, dims=("time", "features"))

    # Emission distribution of the observed variable X
    i_x = pm.Normal(name="i_x", mu=i_z, sigma=i_sd_x, observed=x_obs)
    
    idata = pm.sample(1000, init="adapt_diag", tune=1000, chains=2, random_seed=0)

# Plotting results

az.plot_trace(idata, var_names=["i_sd_z", "i_sd_x"])
plt.show()

m1 = idata.posterior["i_z"].sel(features=0).mean(dim=["chain", "draw"])
sd1 = idata.posterior["i_z"].sel(features=0).std(dim=["chain", "draw"])

plt.figure(figsize=(15, 8))
plt.plot(range(T), z[:, 0], label="Real", color="tab:blue", marker="o")
plt.plot(range(T), m1, label="Inferred", color="tab:orange", marker="o")
plt.fill_between(range(T), m1 - sd1, m1 + sd1, color="tab:orange", alpha=0.4)
plt.legend()
plt.show()

See from the plots below that the inferred values of the standard deviations are not correct as well as the samples of the latent variable Z. The problem persists even if I set F to 1. Although, if I remove the “features” dimension, everything works well so it seems the problem is with the indexing inside the GaussianRandomWalk distribution.


Python version: 3.9
PyMC version: 5.0.1

Hello!

I think the problem is that the your priors are too tight, and their influence dominates the data in the posterior.

Here is the same graph as that which you showed, except with draws from the prior:

As you can see it strictly excludes the data. In the model like this, the priors are going to play a much more important role in posterior than in, say, a large-N cross-sectional linear model.

Here is draws from the prior of the same model, with sigma=5 on both standard deviation priors, and sigma=10 for the standard deviation of the init_dist distribution:

After adjusting the priors, I get the following posterior for i_z[0]:

Hi @jessegrabowski, thank you for your answer. That might be another problem I did not foresee, but I believe there’s still a problem with the indexing. For instance, if I remove “features” from coords and dims, and leave just the “time” dimension, I get the following:

If I instead just set F to 1, which should be essentially the same, I get degenerated posterior samples for Z:

Yes, I think you are right. The prior variance should grow at rate \sqrt{t}, but you can see from the prior plot I posted above that the variance is constant. As you suggest, when I remove the features dimension, the variances grows as expected. I wonder if the wrong index is being used to take the cumulative sum?

When I do the GRW “by hand”, I get the right result, both in the prior and the posterior:

coords = {"time": np.arange(T), "features": np.arange(F)}
hmm = pm.Model(coords=coords)
with hmm:
    x_obs = pm.MutableData("x_obs", x, dims=("time", 'features'))

    # Standard deviation of the transition and emission distributions
    i_sd_z = pm.HalfNormal(name="i_sd_z", sigma=1)
    i_sd_x = pm.HalfNormal(name="i_sd_x", sigma=1)

    # Transition distribution of latent variable Z
    i_z_0 = pm.Normal('i_z_0', sigma=1, dims=['features'])
    i_z_t= pm.Normal('i_z_t', sigma=i_sd_z, dims=['time', 'features'])
    i_z = pm.Deterministic('i_z', pm.math.concatenate([i_z_0[None], i_z_t], axis=0).cumsum(axis=0)[1:], dims=['time', 'features'])
#     Emission distribution of the observed variable X
    i_x = pm.Normal(name="i_x", mu=i_z, sigma=i_sd_x, observed=x_obs)
    idata = pm.sample_prior_predictive()

My guess is that the prior issue I pointed about above is totally irrelevant, and that there is something else going on in the random walk wrapper. @ricardoV94 ?

1 Like

The GaussianRandomWalk requires the time dimension to be at the rightmost position. That’s the axis over which steps evolve

2 Likes

Thank you, @ricardoV94. Indeed, I swapped the “features” and “time” dimensions and now I get the expected result. It would be very helpful to include that information in the documentation of the GaussianRandomWalk (and MV alternative). I could not find any mention of it in the documentation and I am afraid others might face the same issue in the future.

Totally agree. Would you like to open a pull request for that? This applies to all timeseries

2 Likes

Hi @jessegrabowski, I replicated your solution to perform GRW by hand and indeed I get the correct posterior for Z, but weird inferences with a bunch of divergences for the standard deviation parameters (see image below). Do you have any idea why this is the case?

@ricardoV94’s answer solved the GaussianRandomWalk. This is what I get when I put the time in the last dimension.

However, I need to implement a custom RW that depends on other variables not only the previous one in the same chain, so I will need to construct an RW by hand as you did in your example.

Thank you!

@ricardoV94, will do.

You could try a lower higher target_accept, or adjusting the priors. These are the usual superficial solutions to divergences. Deeper solutions require thinking about why the model might not be a good fit for the data and making structural adjustments.

The nonsense with x0 can be improved to remove the concatenation and slicing, which might help. Note that the x_0 term in x_0 + (x_0 + x_1) + (x_0 + x_1 + x_2) can just be written as broadcast addition into all the accumulated terms, so all the concatenation stuff can just be reduced to: i_z_0 + i_z_t.cumsum(axis=0)

1 Like

@jessegrabowski, got it. It makes sense. Thank you for your assistance.

Should have written higher target accept, not lower. Sorry.