Problems with Gaussian Random Walk Model

Simon · March 10, 2022, 1:48pm

Hi everyone. I’m currently trying to model some EEG data via a Gaussian random walk (GRW). In particular, I have some difference waves from event-related potentials (ERP) epochs in a 32 X 2 X 282 matrix, where electrodes = 32, groups = 2, and samples = 282 (i.e. 282 timepoints each associated with an amplitude value in microvolts). The model looks as follows:

with pm.Model() as mod:
    sd = pm.Exponential.dist(0.5, shape=E)  ##E = 32 electrodes
    chol, corr, std = pm.LKJCholeskyCov("chol", n=E, eta=2.0, sd_dist=sd, compute_corr=True)
    cov = pm.Deterministic("cov", chol.dot(chol.T))
    g_raw = GRW("g_raw", shape=(S,E,G), sigma=0.5)  ##S = 282 samples, G = 2 groups
    g = pm.Deterministic('g', tt.dot(cov, g_raw.T))
    a = pm.Normal('a', mu=0, sigma=0.5)
    m = pm.Deterministic('m', a + g)
    e = pm.Exponential('e', 0.05)+1
    y = pm.Normal("y", mu=m, sigma=e, observed=amps, shape=amps.shape)

I have sampled this model using NUTS (2000, tune=2000, chains=4, init=‘advi’, target_accept=0.95) and the “g” parameter completely fails to converge, with R_hats over 1.4. However, the ERP signals are approximated very well.

Similarly, I have tried using ADVI (20000, callbacks=[tracker, checks], where tracker tracks mean and std, and checks = CheckParametersConvergence(diff=‘absolute’)) and it does not converge, but it approximates the signal very well. Here’s the ADVI approximation at one electrode (Pz):

And here are the results of the ADVI tracker:

Curiously, the correlations from the LKJ prior are also sensible. In neuroscience EEG research the P300 ERP is commonly observed in an electrode cluster surrounding posterior electrodes such as Pz. This is what I get from few electrodes around Pz (makes sense):

	P3	P4	Pz	PO3	PO4
P3	1.00	0.42	0.50	0.53	0.46
P4	0.42	1.00	0.59	0.53	0.56
Pz	0.50	0.59	1.00	0.59	0.58
PO3	0.53	0.53	0.59	1.00	0.55
PO4	0.46	0.56	0.58	0.55	1.00

My impression is that my model is fundamentally flawed, but I cannot figure out where the problem is. Maybe a GRW is not appropriate for this type of data (though ERP waves are technically time series of amplitude). Maybe the LKJ prior is not appropriate (I don’t know of any possible replacement). Mathematically I’m a bit at a loss, as I’m not able to say what may be producing such a problematic posterior (in terms of sampling at least), though a posterior that gives coherent results. When I replace the LKJ prior by diagonal matrix of ones (tt.eye(E)), the model converges well (the drawback is that all the information about correlations between electrodes is lost/unaccounted).

Any help or comment will be sincerely appreciated. Many thanks in advance.

ckrapu · March 11, 2022, 11:30pm

If you want to induce the covariance matrix in cov between the elements of g, I think you want the product of chol and g_raw instead.

Also, here, I can’t quite tell what you mean to do with the variable a

a = pm.Normal(‘a’, mu=0, sigma=0.5)

since it isn’t used later on, except for in the Deterministic on the following line. It seems like you’re just recording g with extra IID Gaussian noise in the line m = pm.Deterministic('m', a + g).

Simon · March 14, 2022, 8:30pm

Many thanks for the reply. The ‘g’ prior is an attempt of a non-centred multivariate using the covariance matrix (cov) with the Cholesky decomposition already applied in the third line of the model:

cov = pm.Deterministic("cov", chol.dot(chol.T))

I suppose that the ‘chol’ parameter can be directly applied as you suggest, but I’m not sure whether it would be much different. Also, you make a very good point about the ‘a’ parameter. I had added it for exactly that purpose, to model noise that I didn’t know how to account for when not using the LKJ prior (i.e. tt.eye(E) diagonal matrix, as mentioned in my previous post). But maybe that was the wrong approach.

I sampled the model with the changes you suggested, but the convergence issues remain.

ckrapu · March 14, 2022, 8:49pm

Another thing to consider is that the scale parameter for the GRW is redundant - you are implicitly giving it a scale parameterization already in the chol / cov variable. I would recommend not using it here. Can you rerun it and share some of the trace plots for the elements of g with high rhat?

Simon · March 15, 2022, 5:48pm

Thanks again. You’re right, that scale parameter is quite redundant. So, I run the model with the changes you suggested:

with pm.Model() as mod:
    sd = pm.Exponential.dist(0.5, shape=E)
    chol, corr, std = pm.LKJCholeskyCov("chol", n=E, eta=2.0, sd_dist=sd, compute_corr=True)
    g_raw = GRW("g_raw", shape=(S,E,G))
    g = pm.Deterministic('g', tt.dot(chol, g_raw.T))
    e = pm.Exponential('e', 0.05)+1
    like = pm.Normal("y", mu=g, sigma=e, observed=amps, shape=amps.shape)
 
with mod:
    trace = pm.sample(1000, tune=1000, chains=3, cores=6, init='adapt_diag')

Sorry I run fewer chains and samples but I needed to save some time to run other things (in any case, previous models tended to get worse with more chains/samples). This time the R_hats improved slightly (the worst were near 1.3, still pretty bad) and the problematic parameters were mainly from ‘g_raw’ and ‘chol’, here are some example plots as you requested:

ckrapu · March 15, 2022, 6:10pm

Frankly, I think you just need to run another 1000 samples or so. You’re estimating a lot of variables here and I’m not surprised 1000+1000 isn’t enough. I’m impressed that it’s giving samples like these for a 32 dimensional correlated random walk. How long are your runs taking so far

The only other thing I can recommend is using a different prior than pm.Exponential for the chol diagonal elements. You may get more mileage out of a Halfnormal sd=0.5 or 1.0 prior.

Simon · March 16, 2022, 8:01pm

I’m sorry about my previous (deleted) post, I had run the model with a very silly bug. I run the appropriate model now, with the priors you suggested:

with pm.Model() as mod:
    sd = pm.HalfNormal.dist(0.5, shape=E)
    chol, corr, std = pm.LKJCholeskyCov("chol", n=E, eta=2.0, sd_dist=sd, compute_corr=True)
    m_raw = GRW("m_raw", shape=(S,E,G))
    m = pm.Deterministic('m', tt.dot(chol, m_raw.T))
    s = pm.Exponential('s', 0.05)+1
    like = pm.Normal("y", mu=m, sigma=s, observed=amps, shape=amps.shape)
 
with mod:
    trace = pm.sample(2000, tune=2000, chains=4, cores=16, init='adapt_diag')

Note that I renamed ‘g_raw’ and ‘g’ to ‘m_raw’ and ‘m’ respectively. The model took a bit more than 2 hours to run. As I mentioned before, the model tends to get worse with more samples. Now the problematic R_hats go well over 1.4. Here are some examples:

Given how small were the previous examples’ ESS this is not so surprising (I guess). Also, I don’t know whether a HalfNormal has better properties than an Exponential for the LKJ prior (this may have caused some issues as well). Finally, note the ‘sinking’ in the traceplots between 5000 and 6000 samples and the high autocorrelations. My impression is that something is fundamentally wrong with the model at a deeper level, but I cannot figure out what . Many thanks for your help.

ckrapu · March 16, 2022, 8:25pm

Finally, note the ‘sinking’ in the traceplots between 5000 and 6000 samples and the high autocorrelations

Since the chains are concatenated together in the outputs, that looks like the second 1000 post-tuning samples for chain 3 are off. If I had to guess, there are some identifiability issues arising because of the latent GRW which manifest themselves as sharp jumps between parameter values for m_raw.

From a conceptual standpoint and knowing nothing about the underlying science here, I would wonder if you could get away with a GRW of lower dimensionality. That would probably help with identifiability issues and also cut down on the compute.

Simon · March 17, 2022, 2:02pm

The original model uses a simple diagonal matrix of ones as the GRW covariance, but manages to converge and approximates each electrode’s signal appropriately. That’s already a small contribution to modelling ERPs as time series (already going to be in a conference poster). In my experience, at least, there are very few Bayesian models addressing both time-samples and electrodes at the same time (if anybody knows of similar alternative models, that’d be great to read). An ideal model would take the covariance matrices at each time-point, but that seems quite unreachable at the moment. So my attempt was to design a model that at least can take the overall covariance/correlation amongst electrodes. I’ve tried modelling each condition separately (a small dimensional reduction), but the model keeps failing. Another option is to model at the single electrode level, which wouldn’t require a multivariate GRW (ergo no covariance matrix), but all the information from other electrodes (electrode array) is lost. Finally, it would be possible to reduce the density of the electrode array (e.g. from 32 to 16 electrodes), but I’ve just run such model and it didn’t converge. I’ll keep investigating this, hopefully I’ll come up with a solution or alternative, if useful/relevant I can port results here. Thanks again for the suggestions.

Simon · April 5, 2022, 6:12pm

Hi again. and thanks again for the feedback. I have managed to solve the problem, or sort of. My reasoning was going back to the basics for a moment. That is, pondering whether event related potentials (ERPs) can actually be modelled by a gaussian random walk (GRW). Although ERPs have several sources of noise that may be random, i.e. from unknown or uncertain sources (such as unrelated brain processes, unidentified electrical sources in the environment, etc.), they are not a stochastic process on their own (i.e. they are postsynaptic electrical activity elicited by a specific stimuli in a controlled experiment, etc.). So, maybe, a GRW wasn’t the appropriate prior after all. I changed the GRW prior for a Normal distribution, such as to have multivariate normal from the product with the Cholesky decomposition covariance prior (using the Cholesky prior directly produced convergence problems). The resulting model looks like this:

with pm.Model() as mod: #Multivariate Normal model with LKJ prior
    sd = pm.HalfNormal.dist(1.0)
    chol, corr, std = pm.LKJCholeskyCov("chol", n=E, eta=6.0, sd_dist=sd, compute_corr=True)
    cov = pm.Deterministic("cov", chol.dot(chol.T))
    m = pm.Normal('m', mu=0, sigma=1.0)
    g = pm.Normal("g", mu=m, shape=(S,E,G))
    M = pm.Deterministic("M", tt.dot(cov, g.T))
    e = pm.HalfNormal("e", 5.0)+1.0
    y = pm.Normal("y", mu=M, sigma=e, observed=amps)

with mod:
    trace = pm.sample(2000, tune=2000, chains=4, cores=8, init='adapt_diag')

I run some prior predictive checks to tune up the values assigned to each prior, and it works the best with a relatively high eta parameter for the Cholesky prior, and relatively high sigma for the likelihood’s HalfNormal. At a single electrode, the prior predictive look like this:

This is quite reasonable, as an amplitude variation between -10uV and 10uV is to be expected for ERPs such as P300. Naturally, the observations at the baseline should always be near zero (if baseline correction was applied), but observations should make these priors adapt. Indeed, the model samples very well, with all ESS over 2000 and all R_hats =~ 1. Here is an example of the approximated signal at electrode Pz:

An these are topographical maps of the scalp showing estimated activity averaged across a time window (350-750ms):

The only small issue I see is that the estimated peak of the Learners wave is slightly off mark, but not by much. The correlations from the Cholesky prior are way more sensible now, showing high correlations with adjacent electrodes, but negative correlations with frontal electrodes (as they tend towards the opposite polarity):

Sorry I have added so much unnecessary detail, at least for those already familiar with the stats used in this model, but I thought it may come handy for those who may run into this post trying to troubleshoot something or searching for alternative models for ERP data. I hope it helps, I’m sure this could use some extra improvements. I’ll keep working on those and if someone is interested, just give me a shout and I can post/share them (if I have any). Or if anyone has some advice, please just let me know. Thanks in advance.

Topic		Replies	Views
Simple GP model giving large MAP values for covariance timescale Questions	9	581	August 6, 2021
GaussianRandomWalk with varying stepsizes? Questions	1	443	May 26, 2020
Variant of GaussianRandomWalk distribution? Development	2	724	February 3, 2018
Gaussian random walk forecasting Questions	0	626	March 3, 2020
Issue with shape of GaussianRandomWalk variable Questions	4	1009	November 12, 2017

Problems with Gaussian Random Walk Model

Related topics